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Preface 


The  theory  and  practice  of  time  series  analysis  have  developed  rapidly  since  the  appear¬ 
ance  in  1970  of  the  seminal  work  of  George  E.  P.  Box  and  Gwilym  M.  Jenkins,  Time 
Series  Analysis:  Forecasting  and  Control,  now  available  in  its  third  edition  (1994)  with 
co-author  Gregory  C.  Reinsel.  Many  books  on  time  series  have  appeared  since  then,  but 
some  of  them  give  too  little  practical  application,  while  others  give  too  little  theoretical 
background.  This  book  attempts  to  present  both  application  and  theory  at  a  level  acces¬ 
sible  to  a  wide  variety  of  students  and  practitioners.  Our  approach  is  to  mix  application 
and  theory  throughout  the  book  as  they  are  naturally  needed. 

The  book  was  developed  for  a  one-semester  course  usually  attended  by  students  in 
statistics,  economics,  business,  engineering,  and  quantitative  social  sciences.  Basic 
applied  statistics  through  multiple  linear  regression  is  assumed.  Calculus  is  assumed 
only  to  the  extent  of  minimizing  sums  of  squares,  but  a  calculus-based  introduction  to 
statistics  is  necessary  for  a  thorough  understanding  of  some  of  the  theory.  However, 
required  facts  concerning  expectation,  variance,  covariance,  and  correlation  are 
reviewed  in  appendices.  Also,  conditional  expectation  properties  and  minimum  mean 
square  error  prediction  are  developed  in  appendices.  Actual  time  series  data  drawn  from 
various  disciplines  are  used  throughout  the  book  to  illustrate  the  methodology.  The  book 
contains  additional  topics  of  a  more  advanced  nature  that  can  be  selected  for  inclusion  in 
a  course  if  the  instructor  so  chooses. 

All  of  the  plots  and  numerical  output  displayed  in  the  book  have  been  produced 
with  the  R  software,  which  is  available  from  the  R  Project  for  Statistical  Computing  at 
www.r-project.org.  Some  of  the  numerical  output  has  been  edited  for  additional  clarity 
or  for  simplicity.  R  is  available  as  free  software  under  the  terms  of  the  Free  Software 
Foundation's  GNU  General  Public  License  in  source  code  form.  It  runs  on  a  wide  vari¬ 
ety  of  UNIX  platforms  and  similar  systems,  Windows,  and  MacOS. 

R  is  a  language  and  environment  for  statistical  computing  and  graphics,  provides  a 
wide  variety  of  statistical  (e.g.,  time-series  analysis,  linear  and  nonlinear  modeling,  clas¬ 
sical  statistical  tests)  and  graphical  techniques,  and  is  highly  extensible.  The  extensive 
appendix  An  Introduction  to  R,  provides  an  introduction  to  the  R  software  specially 
designed  to  go  with  this  book.  One  of  the  authors  (KSC)  has  produced  a  large  number  of 
new  or  enhanced  R  functions  specifically  tailored  to  the  methods  described  in  this  book. 
They  are  listed  on  page  468  and  are  available  in  the  package  named  TSA  on  the  R 
Project’s  Website  at  www.r-project.org.  We  have  also  constructed  R  command  script 
files  for  each  chapter.  These  are  available  for  download  at  www.stat.uiowa.edu/ 
-kchan/TSA.htm.  We  also  show  the  required  R  code  beneath  nearly  every  table  and 
graphical  display  in  the  book.  The  datasets  required  for  the  exercises  are  named  in  each 
exercise  by  an  appropriate  filename;  for  example,  larain  for  the  Los  Angeles  rainfall 
data.  However,  if  you  are  using  the  TSA  package,  the  datasets  are  part  of  the  package 
and  may  be  accessed  through  the  R  command  data(larain),  for  example. 

All  of  the  datasets  are  also  available  at  the  textbook  website  as  ASCII  files  with 
variable  names  in  the  first  row.  We  believe  that  many  of  the  plots  and  calculations 
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described  in  the  book  could  also  be  obtained  with  other  software,  such  as  SAS®,  Splus®, 
Statgraphics®,  SCA®,  EViews®,  RATS®,  Ox®,  and  others. 

This  book  is  a  second  edition  of  the  book  Time  Series  Analysis  by  Jonathan  Cryer, 
published  in  1986  by  PWS-Kent  Publishing  (Duxbury  Press).  This  new  edition  contains 
nearly  all  of  the  well-received  original  in  addition  to  considerable  new  material,  numer¬ 
ous  new  datasets,  and  new  exercises.  Some  of  the  new  topics  that  are  integrated  with  the 
original  include  unit  root  tests,  extended  autocorrelation  functions,  subset  ARIMA  mod¬ 
els,  and  bootstrapping.  Completely  new  chapters  cover  the  topics  of  time  series  regres¬ 
sion  models,  time  series  models  of  heteroscedasticity,  spectral  analysis,  and  threshold 
models.  Although  the  level  of  difficulty  in  these  new  chapters  is  somewhat  higher  than 
in  the  more  basic  material,  we  believe  that  the  discussion  is  presented  in  a  way  that  will 
make  the  material  accessible  and  quite  useful  to  a  broad  audience  of  users.  Chapter  15, 
Threshold  Models,  is  placed  last  since  it  is  the  only  chapter  that  deals  with  nonlinear 
time  series  models.  It  could  be  covered  earlier,  say  after  Chapter  12.  Also,  Chapters  13 
and  14  on  spectral  analysis  could  be  covered  after  Chapter  10. 

We  would  like  to  thank  John  Kimmel,  Executive  Editor,  Statistics,  at  Springer,  for 
his  continuing  interest  and  guidance  during  the  long  preparation  of  the  manuscript.  Pro¬ 
fessor  Howell  Tong  of  the  London  School  of  Economics,  Professor  Henghsiu  Tsai  of 
Academica  Sinica,  Taipei,  Professor  Noelle  Samia  of  Northwestern  University,  Profes¬ 
sor  W.  K.  Li  and  Professor  Kai  W.  Ng,  both  of  the  University  of  Hong  Kong,  and  Profes¬ 
sor  Nils  Christian  Stenseth  of  the  University  of  Oslo  kindly  read  parts  of  the  manuscript, 
and  Professor  Jun  Yan  used  a  preliminary  version  of  the  text  for  a  class  at  the  University 
of  Iowa.  Their  constructive  comments  are  greatly  appreciated.  We  would  like  to  thank 
Samuel  Hao  who  helped  with  the  exercise  solutions  and  read  the  appendix:  An  Introduc¬ 
tion  to  R.  We  would  also  like  to  thank  several  anonymous  reviewers  who  read  the  manu¬ 
script  at  various  stages.  Their  reviews  led  to  a  much  improved  book.  Finally,  one  of  the 
authors  (JDC)  would  like  to  thank  Dan,  Marian,  and  Gene  for  providing  such  a  great 
place,  Casa  de  Artes,  Club  Santiago,  Mexico,  for  working  on  the  first  draft  of  much  of 
this  new  edition. 


Iowa  City,  Iowa 
January  2008 


Jonathan  D.  Cryer 
Kung-Sik  Chan 
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Chapter  1 

Introduction 


Data  obtained  from  observations  collected  sequentially  over  time  are  extremely  com¬ 
mon.  In  business,  we  observe  weekly  interest  rates,  daily  closing  stock  prices,  monthly 
price  indices,  yearly  sales  figures,  and  so  forth.  In  meteorology,  we  observe  daily  high 
and  low  temperatures,  annual  precipitation  and  drought  indices,  and  hourly  wind 
speeds.  In  agriculture,  we  record  annual  figures  for  crop  and  livestock  production,  soil 
erosion,  and  export  sales.  In  the  biological  sciences,  we  observe  the  electrical  activity  of 
the  heart  at  millisecond  intervals.  In  ecology,  we  record  the  abundance  of  an  animal  spe¬ 
cies.  The  list  of  areas  in  which  time  series  are  studied  is  virtually  endless.  The  purpose 
of  time  series  analysis  is  generally  twofold:  to  understand  or  model  the  stochastic  mech¬ 
anism  that  gives  rise  to  an  observed  series  and  to  predict  or  forecast  the  future  values  of 
a  series  based  on  the  history  of  that  series  and,  possibly,  other  related  series  or  factors. 

This  chapter  will  introduce  a  variety  of  examples  of  time  series  from  diverse  areas 
of  application.  A  somewhat  unique  feature  of  time  series  and  their  models  is  that  we 
usually  cannot  assume  that  the  observations  arise  independently  from  a  common  popu¬ 
lation  (or  from  populations  with  different  means,  for  example).  Studying  models  that 
incorporate  dependence  is  the  key  concept  in  time  series  analysis. 

1.1  Examples  of  Time  Series 


In  this  section,  we  introduce  a  number  of  examples  that  will  be  pursued  in  later  chapters. 

Annual  Rainfall  in  Los  Angeles 

Exhibit  1 . 1  displays  a  time  series  plot  of  the  annual  rainfall  amounts  recorded  in  Los 
Angeles,  California,  over  more  than  100  years.  The  plot  shows  considerable  variation  in 
rainfall  amount  over  the  years  —  some  years  are  low,  some  high,  and  many  are 
in-between  in  value.  The  year  1883  was  an  exceptionally  wet  year  for  Los  Angeles, 
while  1983  was  quite  dry.  Lor  analysis  and  modeling  purposes  we  are  interested  in 
whether  or  not  consecutive  years  are  related  in  some  way.  If  so,  we  might  be  able  to  use 
one  year’s  rainfall  value  to  help  forecast  next  year’s  rainfall  amount.  One  graphical  way 
to  investigate  that  question  is  to  pair  up  consecutive  rainfall  values  and  plot  the  resulting 
scatterplot  of  pairs. 

Exhibit  1.2  shows  such  a  scatterplot  for  rainfall.  Lor  example,  the  point  plotted  near 
the  lower  right-hand  corner  shows  that  the  year  of  extremely  high  rainfall,  40  inches  in 
1883,  was  followed  by  a  middle  of  the  road  amount  (about  12  inches)  in  1884.  The  point 
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near  the  top  of  the  display  shows  that  the  40  inch  year  was  preceded  by  a  much  more 
typical  year  of  about  15  inches. 


Exhibit  1.1  Time  Series  Plot  of  Los  Angeles  Annual  Rainfall 


>  library (TSA) 

>  win . graph (width=4 . 875 ,  height=2 . 5 , pointsize=8 ) 

>  data ( larain)  ;  plot ( larain, ylab= 1  Inches xlab= 1  Year type= ' o ' ) 


Exhibit  1.2  Scatterplot  of  LA  Rainfall  versus  Last  Year’s  LA  Rainfall 


>  win . graph (width=3 , height =3 , pointsize=8 ) 

>  plot (y= larain, x=zlag (larain) , ylab= ' Inches  1 , 

xlab= ' Previous  Year  Inches ' ) 
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The  main  impression  that  we  obtain  from  this  plot  is  that  there  is  little  if  any  infor¬ 
mation  about  this  year’s  rainfall  amount  from  last  year’s  amount.  The  plot  shows  no 
“trends”  and  no  general  tendencies.  There  is  little  correlation  between  last  year’s  rainfall 
amount  and  this  year’s  amount.  From  a  modeling  or  forecasting  point  of  view,  this  is  not 
a  very  interesting  time  series! 

An  Industrial  Chemical  Process 

As  a  second  example,  we  consider  a  time  series  from  an  industrial  chemical  process. 
The  variable  measured  here  is  a  color  property  from  consecutive  batches  in  the  process. 
Exhibit  1.3  shows  a  time  series  plot  of  these  color  values.  Here  values  that  are  neighbors 
in  time  tend  to  be  similar  in  size.  It  seems  that  neighbors  are  related  to  one  another. 


Exhibit  1.3  Time  Series  Plot  of  Color  Property  from  a  Chemical  Process 


>  win . graph (width=4 . 875 ,  height  =  2 . 5 , pointsize=8 ) 

>  data (color) 

>  plot (color , ylab= 1  Color  Property xlab= ' Batch type= ' o ' ) 


This  can  be  seen  better  by  constructing  the  scatterplot  of  neighboring  pairs  as  we 
did  with  the  first  example. 

Exhibit  1.4  displays  the  scatterplot  of  the  neighboring  pairs  of  color  values.  We  see 
a  slight  upward  trend  in  this  plot — low  values  tend  to  be  followed  in  the  next  batch  by 
low  values,  middle-sized  values  tend  to  be  followed  by  middle-sized  values,  and  high 
values  tend  to  be  followed  by  high  values.  The  trend  is  apparent  but  is  not  terribly 
strong.  For  example,  the  correlation  in  this  scatterplot  is  about  0.6. 
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Exhibit  1 .4  Scatterplot  of  Color  Value  versus  Previous  Color  Value 


65  70  75  80  85 

Previous  Batch  Color  Property 


>  win . graph (width=3 , height =3 , pointsize=8 ) 

>  plot (y=color, x=zlag (color) , ylab= ' Color  Property', 

xlab= ' Previous  Batch  Color  Property') 

Annual  Abundance  of  Canadian  Hare 

Our  third  example  concerns  the  annual  abundance  of  Canadian  hare.  Exhibit  1.5  gives 
the  time  series  plot  of  this  abundance  over  about  30  years.  Neighboring  values  here  are 
very  closely  related.  Large  changes  in  abundance  do  not  occur  from  one  year  to  the  next. 
This  neighboring  correlation  is  seen  clearly  in  Exhibit  1 .6  where  we  have  plotted  abun¬ 
dance  versus  the  previous  year’s  abundance.  As  in  the  previous  example,  we  see  an 
upward  trend  in  the  plot — low  values  tend  to  be  followed  by  low  values  in  the  next  year, 
middle-sized  values  by  middle-sized  values,  and  high  values  by  high  values. 


1.1  Examples  of  Time  Series 


5 


Exhibit  1.5  Abundance  of  Canadian  Hare 


>  win . graph (width=4 . 875 ,  height=2 . 5 , pointsize=8 ) 

>  data (hare);  plot (hare , ylab= 1  Abundance xlab= ' Year type= ' o ' ) 


Exhibit  1 .6  Hare  Abundance  versus  Previous  Year’s  Hare  Abundance 


>  win . graph (width=3 ,  height=3 , pointsize=8 ) 

>  plot (y=hare , x=zlag (hare) , ylab= ' Abundance ' , 

xlab= ' Previous  Year  Abundance ' ) 
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Monthly  Average  Temperatures  in  Dubuque,  Iowa 

The  average  monthly  temperatures  (in  degrees  Fahrenheit)  over  a  number  of  years 
recorded  in  Dubuque,  Iowa,  are  shown  in  Exhibit  1.7. 

Exhibit  1.7  Average  Monthly  Temperatures,  Dubuque,  Iowa 


Time 

>  win . graph (width=4 . 875 ,  height=2 . 5 , pointsize=8 ) 

>  data (tempdub)  ;  plot (tempdub , ylab= ' Temperature  1 , type= 1 o ' ) 


This  time  series  displays  a  very  regular  pattern  called  seasonality.  Seasonality  for 
monthly  values  occurs  when  observations  twelve  months  apart  are  related  in  some  man¬ 
ner  or  another.  All  Januarys  and  Februarys  are  quite  cold  but  they  are  similar  in  value 
and  different  from  the  temperatures  of  the  warmer  months  of  June,  July,  and  August,  for 
example.  There  is  still  variation  among  the  January  values  and  variation  among  the  June 
values.  Models  for  such  series  must  accommodate  this  variation  while  preserving  the 
similarities.  Here  the  reason  for  the  seasonality  is  well  understood — the  Northern 
Hemisphere’s  changing  inclination  toward  the  sun. 

Monthly  Oil  Filter  Sales 

Our  last  example  for  this  chapter  concerns  the  monthly  sales  to  dealers  of  a  specialty  oil 
filter  for  construction  equipment  manufactured  by  John  Deere.  When  these  data  were 
first  presented  to  one  of  the  authors,  the  manager  said,  “There  is  no  reason  to  believe 
that  these  sales  are  seasonal.”  Seasonality  would  be  present  if  January  values  tended  to 
be  related  to  other  January  values,  February  values  tended  to  be  related  to  other  Febru¬ 
ary  values,  and  so  forth.  The  time  series  plot  shown  in  Exhibit  1.8  is  not  designed  to  dis¬ 
play  seasonality  especially  well.  Exhibit  1.9  gives  the  same  plot  but  amended  to  use 
meaningful  plotting  symbols.  In  this  plot,  all  January  values  are  plotted  with  the  charac¬ 
ter  J,  all  Februarys  with  F,  all  Marches  with  M.  and  so  forth. 1  With  these  plotting  sym¬ 
bols,  it  is  much  easier  to  see  that  sales  for  the  winter  months  of  January  and  February  all 
tend  to  be  high,  while  sales  in  September,  October,  November,  and  December  are  gener- 
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ally  quite  low.  The  seasonality  in  the  data  is  much  easier  to  see  from  this  modified  time 
series  plot. 


Exhibit  1.8  Monthly  Oil  Filter  Sales 


1984  1985  1986  1987 


Time 

>  data (oilf ilters)  ;  plot (oilfilters , type= 1 o ylab= ' Sales  1 ) 


Exhibit  1.9  Monthly  Oil  Filter  Sales  with  Special  Plotting  Symbols 


1984  1985  1986  1987 


J=January  (and  June  and  July), 

F=February,  M=March  (and  May),  and  so  forth 


>  plot (oilf ilters , type= ' 1 ' , ylab= 1  Sales '  ) 

>  points (y=oilf ilters , x=time (oilf ilters ) , 

pch=as .vector (season (oilf ilters) ) ) 


1  In  reading  the  plot,  you  will  still  have  to  distinguish  between  Januarys,  Junes,  and  Julys, 
between  Marches  and  Mays,  and  Aprils  and  Augusts,  but  this  is  easily  done  by  looking  at 
neighboring  plotting  characters. 
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In  general,  our  goal  is  to  emphasize  plotting  methods  that  are  appropriate  and  use¬ 
ful  for  finding  patterns  that  will  lead  to  suitable  models  for  our  time  series  data.  In  later 
chapters,  we  will  consider  several  different  ways  to  incorporate  seasonality  into  time 
series  models. 

1.2  A  Model-Building  Strategy 


Finding  appropriate  models  for  time  series  is  a  nontrivial  task.  We  will  develop  a  multi- 
step  model-building  strategy  espoused  so  well  by  Box  and  Jenkins  (1976).  There  are 
three  main  steps  in  the  process,  each  of  which  may  be  used  several  times: 

1 .  model  specification  (or  identification) 

2.  model  fitting,  and 

3.  model  diagnostics 

In  model  specification  (or  identification),  the  classes  of  time  series  models  are 
selected  that  may  be  appropriate  for  a  given  observed  series.  In  this  step  we  look  at  the 
time  plot  of  the  series,  compute  many  different  statistics  from  the  data,  and  also  apply 
any  knowledge  of  the  subject  matter  in  which  the  data  arise,  such  as  biology,  business, 
or  ecology.  It  should  be  emphasized  that  the  model  chosen  at  this  point  is  tentative  and 
subject  to  revision  later  on  in  the  analysis. 

In  choosing  a  model,  we  shall  attempt  to  adhere  to  the  principle  of  parsimony;  that 
is,  the  model  used  should  require  the  smallest  number  of  parameters  that  will  adequately 
represent  the  time  series.  Albert  Einstein  is  quoted  in  Parzen  (1982,  p.  68)  as  remarking 
that  “everything  should  be  made  as  simple  as  possible  but  not  simpler.” 

The  model  will  inevitably  involve  one  or  more  parameters  whose  values  must  be 
estimated  from  the  observed  series.  Model  fitting  consists  of  finding  the  best  possible 
estimates  of  those  unknown  parameters  within  a  given  model.  We  shall  consider  criteria 
such  as  least  squares  and  maximum  likelihood  for  estimation. 

Model  diagnostics  is  concerned  with  assessing  the  quality  of  the  model  that  we 
have  specified  and  estimated.  How  well  does  the  model  fit  the  data?  Are  the  assump¬ 
tions  of  the  model  reasonably  well  satisfied?  If  no  inadequacies  are  found,  the  modeling 
may  be  assumed  to  be  complete,  and  the  model  may  be  used,  for  example,  to  forecast 
future  values.  Otherwise,  we  choose  another  model  in  the  light  of  the  inadequacies 
found;  that  is,  we  return  to  the  model  specification  step.  In  this  way,  we  cycle  through 
the  three  steps  until,  ideally,  an  acceptable  model  is  found. 

Because  the  computations  required  for  each  step  in  model  building  are  intensive, 
we  shall  rely  on  readily  available  statistical  software  to  carry  out  the  calculations  and  do 
the  plotting. 

1.3  Time  Series  Plots  in  History 


According  toTufte  (1983,  p.  28),  “The  time-series  plot  is  the  most  frequently  used  form 
of  graphic  design.  With  one  dimension  marching  along  to  the  regular  rhythm  of  sec- 
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onds,  minutes,  hours,  days,  weeks,  months,  years,  or  millennia,  the  natural  ordering  of 
the  time  scale  gives  this  design  a  strength  and  efficiency  of  interpretation  found  in  no 
other  graphic  arrangement.” 

Exhibit  1.10  reproduces  what  appears  to  be  the  oldest  known  example  of  a  time 
series  plot,  dating  from  the  tenth  (or  possibly  eleventh)  century  and  showing  the  inclina¬ 
tions  of  the  planetary  orbits. '  Commenting  on  this  artifact,  Tufte  says  “It  appears  as  a 
mysterious  and  isolated  wonder  in  the  history  of  data  graphics,  since  the  next  extant 
graphic  of  a  plotted  time-series  shows  up  some  800  years  later.” 


Exhibit  1.10  A  Tenth-Century  Time  Series  Plot 


•till'tS 


1.4  An  Overview  of  the  Book 


Chapter  2  develops  the  basic  ideas  of  mean,  covariance,  and  correlation  functions  and 
ends  with  the  important  concept  of  stationarity.  Chapter  3  discusses  trend  analysis  and 
investigates  how  to  estimate  and  check  common  deterministic  trend  models,  such  as 
those  for  linear  time  trends  and  seasonal  means. 

Chapter  4  begins  the  development  of  parametric  models  for  stationary  time  series, 
namely  the  so-called  autoregressive  moving  average  (ARMA)  models  (also  known  as 
Box-Jenkins  models).  These  models  are  then  generalized  in  Chapter  5  to  encompass 
certain  types  of  stochastic  nonstationary  cases — the  ARIMA  models. 

Chapters  6,  7,  and  8  form  the  heart  of  the  model-building  strategy  for  ARIMA  mod¬ 
eling.  Techniques  are  presented  for  tentatively  specifying  models  (Chapter  6),  effi¬ 
ciently  estimating  the  model  parameters  using  least  squares  and  maximum  likelihood 
(Chapter  7),  and  determining  how  well  the  models  fit  the  data  (Chapter  8). 

Chapter  9  thoroughly  develops  the  theory  and  methods  of  minimum  mean  square 
error  forecasting  for  ARIMA  models.  Chapter  10  extends  the  ideas  of  Chapters  4 


f  From  Tufte  (1983,  p.  28). 
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through  9  to  stochastic  seasonal  models.  The  remaining  chapters  cover  selected  topics 
and  are  of  a  somewhat  more  advanced  nature. 


Exercises 


1.1  Use  software  to  produce  the  time  series  plot  shown  in  Exhibit  1.2,  on  page  2.  The 
data  are  in  the  file  named  larain. ' 

1.2  Produce  the  time  series  plot  displayed  in  Exhibit  1.3,  on  page  3.  The  data  file  is 
named  color. 

1.3  Simulate  a  completely  random  process  of  length  48  with  independent,  normal  val¬ 
ues.  Plot  the  time  series  plot.  Does  it  look  “random”?  Repeat  this  exercise  several 
times  with  a  new  simulation  each  time. 

1.4  Simulate  a  completely  random  process  of  length  48  with  independent,  chi-square 
distributed  values,  each  with  2  degrees  of  freedom.  Display  the  time  series  plot. 
Does  it  look  “random”  and  nonnormal?  Repeat  this  exercise  several  times  with  a 
new  simulation  each  time. 

1.5  Simulate  a  completely  random  process  of  length  48  with  independent,  r-distrib- 
uted  values  each  with  5  degrees  of  freedom.  Construct  the  time  series  plot.  Does  it 
look  “random”  and  nonnormal?  Repeat  this  exercise  several  times  with  a  new 
simulation  each  time. 

1.6  Construct  a  time  series  plot  with  monthly  plotting  symbols  for  the  Dubuque  tem¬ 
perature  series  as  in  Exhibit  1.7,  on  page  6.  The  data  are  in  the  file  named  temp- 
dub. 


1  If  you  have  installed  the  R  package  TSA,  available  for  download  at  www.r-project.org,  the 
larain  data  are  accessed  by  the  R  command:  data(larain).  An  ASCII  file  of  the  data  is  also 
available  on  the  book  Website  at  www.stat.uiowa.edu/~kchan/TSA.htm. 


Chapter  2 

Fundamental  Concepts 


This  chapter  describes  the  fundamental  concepts  in  the  theory  of  time  series  models.  In 
particular,  we  introduce  the  concepts  of  stochastic  processes,  mean  and  covariance  func¬ 
tions,  stationary  processes,  and  autocorrelation  functions. 

2.1  Time  Series  and  Stochastic  Processes 


The  sequence  of  random  variables  {Yt\  t  =  0,  ±1,  ±2,  ±3,...}  is  called  a  stochastic 
process  and  serves  as  a  model  for  an  observed  time  series.  It  is  known  that  the  complete 
probabilistic  structure  of  such  a  process  is  determined  by  the  set  of  distributions  of  all 
finite  collections  of  the  F s.  Fortunately,  we  will  not  have  to  deal  explicitly  with  these 
multivariate  distributions.  Much  of  the  information  in  these  joint  distributions  can  be 
described  in  terms  of  means,  variances,  and  covariances.  Consequently,  we  concentrate 
our  efforts  on  these  first  and  second  moments.  (If  the  joint  distributions  of  the  Fs  are 
multivariate  normal  distributions,  then  the  first  and  second  moments  completely  deter¬ 
mine  all  the  joint  distributions.) 

2.2  Means,  Variances,  and  Covariances 


For  a  stochastic  process  { Y, :  /  =  0,  ±  1 ,  ±2,  ±3, . . . } ,  the  mean  function  is  defined  by 

|if  =  E(Yt)  for  t  =  0,  ±1,  ±2,  ...  (2.2.1) 


That  is,  pf  is  just  the  expected  value  of  the  process  at  time  t.  In  general,  p,  can  be  differ¬ 
ent  at  each  time  point  t. 

The  autocovariance  function,  yt  s,  is  defined  as 

y  ts  =  Cov(Yt,Ys)  for  t,  s  =  0,  ±1,  ±2, ...  (2.2.2) 


where  Cov(Yt,  Ys)  =  E[(Yt  -  p,)(Ks  -  ps)]  =  E(Y,YS)  -  p,p5. 

The  autocorrelation  function,  pf  is  given  by 

Pf,.v  =  Corr(Y, »  Ys)  for  t,  s  =  0,  ±1,  ±2,  ... 


where 


Corr{Yt,  Ys) 


Cov{Yt,Ys) 
JVar(Yt)Var(Ys ) 


Vt,s 

4h,fis,s 


(2.2.3) 


(2.2.4) 
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We  review  the  basic  properties  of  expectation,  variance,  covariance,  and  correlation 
in  Appendix  A  on  page  24. 

Recall  that  both  covariance  and  correlation  are  measures  of  the  (linear)  dependence 
between  random  variables  but  that  the  unitless  correlation  is  somewhat  easier  to  inter¬ 
pret.  The  following  important  properties  follow  from  known  results  and  our  definitions: 


II 

2= 

"'S 

p  t.t  =  1 

7f,  s  ^ s,t 

Pf,  i  =  p,,  t ' 

>5 

oT 

VI 

<*i 

"O 

Co 

IA 

(2.2.5) 


Values  of  p,  s  near  ±1  indicate  strong  (linear)  dependence,  whereas  values  near  zero 
indicate  weak  (linear)  dependence.  If  p ,  s  =  0,  we  say  that  Y,  and  Ys  are  uncorrelated. 

To  investigate  the  covariance  properties  of  various  time  series  models,  the  follow¬ 
ing  result  will  be  used  repeatedly:  If  Cj,  C2,...,  cm  and  d\,  ••• >  dn  are  constants  and  t\, 

to,...,  tm  and  .v | ,  S2,. ■  ■ ,  sn  are  time  points,  then 


Cov 


z 


c-Y 

c  i 1  t> 


j=  l 


% 


=  Z  Z  cidjCov(Yt:  Ys) 


(2.2.6) 


i= 1  7=1 


The  proof  of  Equation  (2.2.6),  though  tedious,  is  a  straightforward  application  of 
the  linear  properties  of  expectation.  As  a  special  case,  we  obtain  the  well-known  result 


Var 


(2.2.7) 


The  Random  Walk 


Let  e\,  e2,...  be  a  sequence  of  independent,  identically  distributed  random  variables 
each  with  zero  mean  and  variance  ct~  .  The  observed  time  series,  {Yt:  t  =  1,  2,...},  is 
constructed  as  follows: 


Y2 


e  i  +  e2 


Y,= 


+ 


+  et 


Alternatively,  we  can  write 


= 


+  e. 


(2.2.8) 


(2.2.9) 


with  “initial  condition”  Tj  =  e\.  If  the  e’s  are  interpreted  as  the  sizes  of  the  “steps”  taken 
(forward  or  backward)  along  a  number  line,  then  Yt  is  the  position  of  the  “random 
walker”  at  time  t.  From  Equation  (2.2.8),  we  obtain  the  mean  function 
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|if  =  E(Yf )  =  E(el  +  e~,  +  •••  +  et)  =  E(ex)  +  E(e0)  +  •••  +  E(et ) 
=  0  +  0+  +0 


so  that 

|i,  =  0  for  all  t  (2.2.10) 

We  also  have 

Var(Yf )  =  Var(ex  +  e2  +  ■■■  +  ef)  =  Var(ex)  +  Var(e2)  +  +  Var(ef) 

=  or  +  cr?  +  •••  +  cr? 


so  that 

Var(Fr)  =  to]  (2.2.11) 

Notice  that  the  process  variance  increases  linearly  with  time. 

To  investigate  the  covariance  function,  suppose  that  1  <  t  <  s.  Then  we  have 

Y =  Cov(Yt,Ys)  =  Cov(ex+e2  +  ■■■  +et,ex+e2+  ■■■  +et  +  et+l  +  ■■■  +es) 
From  Equation  (2.2.6),  we  have 

yt,s=  Z  z  Cov(ep  eP 

i=l  7=1 

However,  these  covariances  are  zero  unless  i  =  j,  in  which  case  they  equal  Var{ej)  =  CT“  . 
There  are  exactly  t  of  these  so  that  yr  s  =  to ^  . 

Since  y;  v  =  ys  f,  this  specifies  the  autocovariance  function  for  all  time  points  t  and  .v 
and  we  can  write 

yt  s  =  to ~e  forl<t<^  (2.2.12) 

The  autocorrelation  function  for  the  random  walk  is  now  easily  obtained  as 


Pf,  £ 


The  following  numerical  values  help  us 
walk. 


for  1  <  f  <  s  (2.2.13) 

understand  the  behavior  of  the  random 


Pi,  2 

P24,  25 


0.707 


—  =  0.980 
25 


Ps,9  =  Jl  =  0-943 
Pi,  25  =  J ^  =  0-2°0 


The  values  of  Y  at  neighboring  time  points  are  more  and  more  strongly  and  posi¬ 
tively  correlated  as  time  goes  by.  On  the  other  hand,  the  values  of  Y  at  distant  time 
points  are  less  and  less  correlated. 

A  simulated  random  walk  is  shown  in  Exhibit  2.1  where  the  e’s  were  selected  from 
a  standard  normal  distribution.  Note  that  even  though  the  theoretical  mean  function  is 


14 


Fundamental  Concepts 


zero  for  all  time  points,  the  fact  that  the  variance  increases  over  time  and  that  the  corre¬ 
lation  between  process  values  nearby  in  time  is  nearly  1  indicate  that  we  should  expect 
long  excursions  of  the  process  away  from  the  mean  level  of  zero. 

The  simple  random  walk  process  provides  a  good  model  (at  least  to  a  first  approxi¬ 
mation)  for  phenomena  as  diverse  as  the  movement  of  common  stock  price,  and  the 
position  of  small  particles  suspended  in  a  fluid — so-called  Brownian  motion. 


Exhibit  2.1  Time  Series  Plot  of  a  Random  Walk 


>  win . graph (width=4 . 875 ,  height=2 . 5 , pointsize=8 ) 

>  data(rwalk)  #  rwalk  contains  a  simulated  random  walk 

>  plot (rwalk, type= ' o 1 , ylab= 1  Random  Walk') 


A  Moving  Average 

As  a  second  example,  suppose  that  {Tf}  is  constructed  as 

e,  +  e  , 

Yt  =  1  (2.2.14) 


where  (as  always  throughout  this  book)  the  e’s  are  assumed  to  be  independent  and  iden¬ 
tically  distributed  with  zero  mean  and  variance  ct“.  Here 


E(et)  +  E(et_l) 
2 


=  0 


and 
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Var(Yt)  =  Var 


t  +  et- 1]  Var(et)+Var(et_l) 


Also 


Cov(Yf,Yt_  j)  =  Cov 


0.5a2 


et  +  et-  1  ef-l  +  e(-2 


2  ’  2 
Cov(et,  et_  j)  +  Cov(et,  et_2)  +  Cov(et_p  e(_  j) 


Cov(et_  j,  ef_j) 


Cov(ef_  j,  e?_2) 


(as  all  the  other  covariances  are  zero) 


0-25a2 


y(  =  0.25ct^  for  all  r 


Furthermore, 


Cov(Fr  Kf _  2)  =  Cov 


e/  +  £V-l  er-2  +  e/-3 


2  ’  2 
=  0  since  the  e's  are  independent. 
Similarly,  Cov(Yt,  Yt_k)  =  0  for  A:  >  1,  so  we  may  write 


= 


0.5a3  for  |f  -  s|  =0 


0.25(7^  for  |t-s|  =  1 


0  for  |f  -  s|  >  1 
For  the  autocorrelation  function,  we  have 


(2.2.15) 


P 


t,  S 


1  for  |f  -  s|  =  0 
0.5  for  |f-s|  =  1 
0  for  |r-  j|  >  1 


(2.2.16) 


since  0.25  /0.5  =  0.5. 

Notice  that  P2  i  =  P3  2  =  P4  3  =  P9  8  =  0.5.  Values  of  Y  precisely  one  time  unit  apart 
have  exactly  the  same  correlation  no  matter  where  they  occur  in  time.  Furthermore,  P3  j 
=  P4  2  =  Pt  t  -  2  and,  more  generally,  pr  t  _  ^  is  the  same  for  all  values  of  t.  This  leads  us  to 
the  important  concept  of  stationarity. 
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2.3  Stationarity 


To  make  statistical  inferences  about  the  structure  of  a  stochastic  process  on  the  basis  of 
an  observed  record  of  that  process,  we  must  usually  make  some  simplifying  (and  pre¬ 
sumably  reasonable)  assumptions  about  that  structure.  The  most  important  such 
assumption  is  that  of  stationarity.  The  basic  idea  of  stationarity  is  that  the  probability 
laws  that  govern  the  behavior  of  the  process  do  not  change  over  time.  In  a  sense,  the  pro¬ 
cess  is  in  statistical  equilibrium.  Specifically,  a  process  {Tf}  is  said  to  be  strictly  sta¬ 
tionary  if  the  joint  distribution  of  Yt  ,Yt  ,...,  Yt  is  the  same  as  the  joint  distribution  of 
Yf  k  Yt  _/0...,  Y,  jf  for  all  choices  of  time  points  fj,  f2, . . . ,  tn  and  all  choices  of  time 
lag  k. 

Thus,  when  n  =  I  the  (univariate)  distribution  of  Yt  is  the  same  as  that  of  Y,  _  k  for 
all  t  and  k\  in  other  words,  the  Y' s  are  (marginally)  identically  distributed.  It  then  follows 
that  E(Yt)  =  E(Yt  _  0  for  all  t  and  k  so  that  the  mean  function  is  constant  for  all  time. 
Additionally,  Var{Yt)  =  Var{Yt  _  0  for  all  1  and  k  so  that  the  variance  is  also  constant  over 
time. 

Setting  n  =  2  in  the  stationarity  definition  we  see  that  the  bivariate  distribution  of  Yt 
and  Ys  must  be  the  same  as  that  of  Yt  _  /(  and  Ys  _  k  from  which  it  follows  that  Cov(Yt,  Ys) 
=  Cov(Yt  _/c,Ys_  0  for  all  1,  s,  and  k.  Putting  k  =  s  and  then  k  =  t\  we  obtain 

Yt ,s  =  Cov(Yt_s,Y 0) 

=  Cov(Y0,Ys_t) 

=  Cov(Y0,Ylt_s{) 

=  Y0,\t-s\ 


That  is,  the  covariance  between  Yt  and  Ys  depends  on  time  only  through  the  time  differ¬ 
ence  1 1  -  v  |  and  not  otherwise  on  the  actual  times  t  and  s.  Thus,  for  a  stationary  process, 
we  can  simplify  our  notation  and  write 

Y k  =  Cov(Yr  Yt_k)  and  pk  =  Corr(Yt,Yt_k)  (2.3.1) 


Note  also  that 

„  _  Yk 
k  Yo 

The  general  properties  given  in  Equation  (2.2.5)  now  become 
Y0  =  Var(Yt)  p0  =  1 

Ya  =  Y-a  Pa  =  P-Jt ; 


Y*  ^  Yo 


N  ^ 1 


(2.3.2) 


If  a  process  is  strictly  stationary  and  has  finite  variance,  then  the  covariance  func¬ 
tion  must  depend  only  on  the  time  lag. 

A  definition  that  is  similar  to  that  of  strict  stationarity  but  is  mathematically  weaker 


2.3  Stationarity 
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is  the  following:  A  stochastic  process  {Yt}  is  said  to  be  weakly  (or  second-order) 
stationary  if 

1 .  The  mean  function  is  constant  over  time,  and 

2.  yt  t  _  k  =  Yo  k  f°r  a^  ^me  1  an^  'aS  ^ 

In  this  book  the  term  stationary  when  used  alone  will  always  refer  to  this  weaker  form  of 
stationarity.  However,  if  the  joint  distributions  for  the  process  are  all  multivariate  normal 
distributions,  it  can  be  shown  that  the  two  definitions  coincide.  For  stationary  processes, 
we  usually  only  consider  k  >  0. 

White  Noise 

A  very  important  example  of  a  stationary  process  is  the  so-called  white  noise  process, 
which  is  defined  as  a  sequence  of  independent,  identically  distributed  random  variables 
{et}.  Its  importance  stems  not  from  the  fact  that  it  is  an  interesting  model  itself  but  from 
the  fact  that  many  useful  processes  can  be  constructed  from  white  noise.  The  fact  that 
{et}  is  strictly  stationary  is  easy  to  see  since 

P,iet  <xpe  <x2,  ...,et  <xn) 

11  n 

=  Pr(et  <xl)Pr(et  <x2 )  -Pr(e,  < x  )  (by  independence) 

1  1  Z  n 

=  Pr(et,-k^xl)Pr(et2-k^x2y-Pr(et  -k~  xn) 

(identical  distributions) 

=  Pr(et  _k<xy,et  _k<x2,...,et  _k<x  )  (by  independence) 

1  2  n 

as  required.  Also,  p(  =  E(er)  is  constant  and 

f  Var(e, )  for  k  =  0 

Y*=  0  for  k^O 


Alternatively,  we  can  write 


for  k  =  0 
for  k  *  0 


(2.3.3) 


The  term  white  noise  arises  from  the  fact  that  a  frequency  analysis  of  the  model  shows 
that,  in  analogy  with  white  light,  all  frequencies  enter  equally.  We  usually  assume  that 
the  white  noise  process  has  mean  zero  and  denote  Var(et )  by  ct^  . 

The  moving  average  example,  on  page  14,  where  Yt  =  (et+  et  _  j)/2,  is  another 
example  of  a  stationary  process  constructed  from  white  noise.  In  our  new  notation,  we 
have  for  the  moving  average  process  that 

1  for  k  =  0 

p^.  =  <  0.5  for  \k\  =  1 
0  for  \k\  >  2 
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Random  Cosine  Wave 


As  a  somewhat  different  example,^  consider  the  process  defined  as  follows: 


cos 


2it  —  +  ® 

12 


for  t  =  0,  ±1,  ±2, 


where  <1>  is  selected  (once)  from  a  uniform  distribution  on  the  interval  from  0  to  1 .  A 
sample  from  such  a  process  will  appear  highly  deterministic  since  Y t  will  repeat  itself 
identically  every  12  time  units  and  look  like  a  perfect  (discrete  time)  cosine  curve.  How¬ 
ever,  its  maximum  will  not  occur  at  t  =  0  but  will  be  determined  by  the  random  phase  ®. 
The  phase  ®  can  be  interpreted  as  the  fraction  of  a  complete  cycle  completed  by  time  t  = 
0.  Still,  the  statistical  properties  of  this  process  can  be  computed  as  follows: 


E(Yt)  =  El  cos 


2n  —  +  ® 
12 


[  cos 

27t(  —  +  4>^) 

J 

L  m2  vj 

d( |) 


0 


1 

—  sin 
2k 


2k\  —  + 1 
12 


i  =  0 


2k 


sin(  2it—  +  2itJ  -  sin^2jt  — 


But  this  is  zero  since  the  sines  must  agree.  So  =  0  for  all  t. 
Also 


r 

r  /  f  \“| 

r  /  „  \-i 

1 

cos 

Hr2  +  v] 

cos 

[V+®)] 

l 

i 

f  cos 

2it(  —  + 

COS 

2;tf—  + 1))! 

*l0 

L  M2  vj 

L  M2  Vj 

d< ]) 


U1 
2  VI 


j  cos 

+  cos 

2k(—  +  2§) 

L  v  12  y  J 

L  V  12  Vj 

z/(f) 


cos 


2k 


t  -  s 
12 


1  . 

+  —  sin 
4  71 


2tt(  V-2  +  2(|j 


i  =  0 


:cos 


2k 


t- s 

12 


'  This  example  contains  optional  material  that  is  not  needed  in  order  to  understand  most  of 
the  remainder  of  this  book.  It  will  be  used  in  Chapter  13,  Introduction  to  Spectral  Analysis. 
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So  the  process  is  stationary  with  autocorrelation  function 

pk  =  cos^2ji^)  for  k  =  0,  ±1,  ±2,  ...  (2.3.4) 

This  example  suggests  that  it  will  be  difficult  to  assess  whether  or  not  stationarity  is 
a  reasonable  assumption  for  a  given  time  series  on  the  basis  of  the  time  sequence  plot  of 
the  observed  data. 

The  random  walk  of  page  12,  where  Y.  =  el  +  e2  +  •••  +  e„  is  also  constructed 
from  white  noise  but  is  not  stationary.  For  example,  the  variance  function,  Var(Yt)  = 
ta ^ ,  is  not  constant;  furthermore,  the  covariance  function  yt  =  ta -  for  0  <t<s  does 
not  depend  only  on  time  lag.  However,  suppose  that  instead  of  analyzing  {Yt}  directly, 
we  consider  the  differences  of  successive  T-values,  denoted  VYt.  Then  VYt=  Y,  -  Yt_ j  = 
et,  so  the  differenced  series,  {VTf},  is  stationary.  This  represents  a  simple  example  of  a 
technique  found  to  be  extremely  useful  in  many  applications.  Clearly,  many  real  time 
series  cannot  be  reasonably  modeled  by  stationary  processes  since  they  are  not  in  statis¬ 
tical  equilibrium  but  are  evolving  over  time.  However,  we  can  frequently  transform  non¬ 
stationary  series  into  stationary  series  by  simple  techniques  such  as  differencing.  Such 
techniques  will  be  vigorously  pursued  in  the  remaining  chapters. 

2.4  Summary 


In  this  chapter  we  have  introduced  the  basic  concepts  of  stochastic  processes  that  serve 
as  models  for  time  series.  In  particular,  you  should  now  be  familiar  with  the  important 
concepts  of  mean  functions,  autocovariance  functions,  and  autocorrelation  functions. 
We  illustrated  these  concepts  with  the  basic  processes:  the  random  walk,  white  noise,  a 
simple  moving  average,  and  a  random  cosine  wave.  Finally,  the  fundamental  concept  of 
stationarity  introduced  here  will  be  used  throughout  the  book. 


Exercises 


2.1  Suppose  E(X)  =  2,  Var(X)  =  9,  E(  Y)  =  0,  Var(  Y)  =  4,  and  Corr(X,Y)  =  0.25.  Find: 

(a)  Var(X  +  Y). 

(b)  Cov(X ,  X+Y). 

(c)  Corr(X  +Y,X-Y). 

2.2  If  X  and  Y are  dependent  but  Var(X)  =  Var(Y),  find  Cov(X  +  Y,X-  Y). 

2.3  Let  X  have  a  distribution  with  mean  li  and  variance  a2,  and  let  Yt  =  X  for  all  t. 

(a)  Show  that  {Yt}  is  strictly  and  weakly  stationary. 

(b)  Find  the  autocovariance  function  for  \Yt\. 

(c)  Sketch  a  “typical”  time  plot  of  Yt. 
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2.4  Let  {et}  be  a  zero  mean  white  noise  process.  Suppose  that  the  observed  process  is 
Yt  =  et  +  9e(_  j,  where  9  is  either  3  or  1/3. 

(a)  Find  the  autocorrelation  function  for  { Yt]  both  when  9  =  3  and  when  9  =  1/3. 

(b)  You  should  have  discovered  that  the  time  series  is  stationary  regardless  of  the 
value  of  9  and  that  the  autocorrelation  functions  are  the  same  for  9  =  3  and  9  = 
1/3.  For  simplicity,  suppose  that  the  process  mean  is  known  to  be  zero  and  the 
variance  of  Yt  is  known  to  be  1.  You  observe  the  series  {Yt}  for  t  =  1,  2,...,  n 
and  suppose  that  you  can  produce  good  estimates  of  the  autocorrelations  p/(. 
Do  you  think  that  you  could  determine  which  value  of  9  is  correct  (3  or  1/3) 
based  on  the  estimate  of  p^?  Why  or  why  not? 

2.5  Suppose  Yt  =  5  +  2r  +  Xt,  where  { Xt }  is  a  zero-mean  stationary  series  with  autoco¬ 
variance  function  y^,. 

(a)  Find  the  mean  function  for  \Yt\. 

(b)  Find  the  autocovariance  function  for  {Yt}. 

(c)  Is  { Yt }  stationary?  Why  or  why  not?  r  for  f  odd 

2.6  Let  !  X, )  be  a  stationary  time  series,  and  define  Y=\  f 

f  [Xt  +  3  for  t  even. 

(a)  Show  that  Cov(  Y„  Yf  , )  is  free  of  t  for  all  lags  k. 

(b)  Is  { Yt }  stationary? 

2.7  Suppose  that  { Yt }  is  stationary  with  autocovariance  function  y^. 

(a)  Show  that  Wt  =  VYt  =  Yt  -  Yt  _  j  is  stationary  by  finding  the  mean  and  autoco¬ 
variance  function  for  { Wt). 

(b)  Show  that  Ut  =  V2f;  =  V|  Yt  -  Lf_J  =  Yt-  2 Yt_ |  +  Yt_2  is  stationary.  (You  need 
not  find  the  mean  and  autocovariance  function  for  { U,}.) 

2.8  Suppose  that  { Yt}  is  stationary  with  autocovariance  function  Show  that  for  any 
fixed  positive  integer  n  and  any  constants  cp ,  c2,...,  c„,  the  process  [  W t )  defined 
by  Wt  =  clYf  +  c0Yf  j  +  •••  +  c  Yf_  +  {  is  stationary.  (Note  that  Exercise 
2.7  is  a  special  case  of  this  result.) 

2.9  Suppose  Yt  =  [S(l  +  (1 1  r  +  Xt,  where  { Xt  \  is  a  zero-mean  stationary  series  with  auto¬ 
covariance  function  yk  and  P(l  and  Pj  are  constants. 

(a)  Show  that  { Yt }  is  not  stationary  but  that  Wt  =  VYt  =  Yt  -  Yt  _  j  is  stationary. 

(b)  In  general,  show  that  if  K,  =  p;  +  Xt,  where  [Y(|  is  a  zero-mean  stationary 
series  and  pf  is  a  polynomial  in  t  of  degree  d.  then  V"1  K(  =  V(Vm_l  Yt)  is  sta¬ 
tionary  for  m  >  d  and  nonstationary  for  0  <  m  <  d. 

2.10  Let  {Yf}  be  a  zero-mean,  unit-variance  stationary  process  with  autocorrelation 
function  p^.  Suppose  that  pf  is  a  nonconstant  function  and  that  ar  is  a  positive-val¬ 
ued  nonconstant  function.  The  observed  series  is  formed  as  Yt  =  p;  +  <J,Xr 

(a)  Find  the  mean  and  covariance  function  for  the  { Yr}  process. 

(b)  Show  that  the  autocorrelation  function  for  the  {  Y t }  process  depends  only  on 
the  time  lag.  Is  the  { Yt }  process  stationary? 

(c)  Is  it  possible  to  have  a  time  series  with  a  constant  mean  and  with 
Corr{Yt,Yt  _  k)  free  of  t  but  with  { Yt}  not  stationary? 
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2.11 

2.12 

2.13 

2.14 

2.15 

2.16 

2.17 


Suppose  Cov(Xt ,Xt  _  0  =  jj,  is  free  of  t  but  that  E(Xt )  =  3 1. 

(a)  Is  {Xf}  stationary? 

(b)  Let  Yt  =  7  -  3f  +  Xt.  Is  { Yt }  stationary? 

Suppose  that  Yt  =  et—  Show  that  {Yt}  is  stationary  and  that,  for  k  >  0,  its 
autocorrelation  function  is  nonzero  only  for  lag  k  =  12. 

'y 

Let  Yt  =  et  -  Q{et_  0  .  For  this  exercise,  assume  that  the  white  noise  series  is  nor¬ 
mally  distributed. 

(a)  Find  the  autocorrelation  function  for  { Yt}. 

(b)  Is  {Yt}  stationary? 

Evaluate  the  mean  and  covariance  function  for  each  of  the  following  processes.  In 
each  case,  determine  whether  or  not  the  process  is  stationary. 

(a)  Yt  =  Q0  +  te y. 

(b)  Wt  =  VYt,  where  Yt  is  as  given  in  part  (a). 

(c)  Yt  =  e,  et  _  ] .  (You  may  assume  that  ( et }  is  normal  white  noise.) 

Suppose  that  X  is  a  random  variable  with  zero  mean.  Define  a  time  series  by 
Yt  =  (-  1?X. 


(a)  Find  the  mean  function  for  { Yt}. 

(b)  Find  the  covariance  function  for  \Yt). 

(c)  Is  {Yt}  stationary? 

Suppose  Yt  =  A  +  Xt,  where  { X, }  is  stationary  and  A  is  random  but  independent  of 
{X,}.  Find  the  mean  and  covariance  function  for  {  Yt }  in  terms  of  the  mean  and 
autocovariance  function  for  {Xt}  and  the  mean  and  variance  of  A. 


Let  {Yt}  be  stationary  with  autocovariance  function  y/c  Let  Y  = 


Show  that 


1*V 


Var(Y)  = 


n-  1 


k=  1 


1  n  -  1 

1  y 

n  t—* 


-  n  +  1 


\k\ 


Vk 


2.18  Let  {  Y t }  be  stationary  with  autocovariance  function  Define  the  sample  vari¬ 
ance  as  S 2  =  V  ( Yt-Y )2. 

”  “  1 1 =  1 

(a)  First  show  that  y(Lf-|i)2=  y  (Ff- Y)2  +  n(Y  -  p)2. 
r=l  i=l 


(b)  Use  part  (a)  to  show  that 

(c,  fits2)  =  0-^-^Varm  =  To-yrlO  -y  ■ 

k  =  1 

(Use  the  results  of  Exercise  2.17  for  the  last  expression.) 

'y 

(d)  If  { Yt}  is  a  white  noise  process  with  variance  y0,  show  that  E{S~)  =  yp. 
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2.19 


2.20 


2.21 


2.22 


Let  F |  =  Bo  +  C|,  and  then  for  t  >  1  define  Y,  recursively  by  Y,  =  0(l  +  Yt_ j  +  e,. 
Here  8(l  is  a  constant.  The  process  {  Yt }  is  called  a  random  walk  with  drift. 

(a)  Show  that  Yt  may  be  rewritten  as  Y t  =  ?9()  +  et  +  etl  +  ■■■  +  e^. 

(b)  Find  the  mean  function  for  Yt. 

(c)  Find  the  autocovariance  function  for  Yt. 

Consider  the  standard  random  walk  model  where  Yt  =  Y,_  j  +  e,  with  Y \  =e\. 

(a)  Use  the  representation  of  Yt  above  to  show  that  |if  =  |if  _  j  for  t  >  1  with  initial 
condition  |ij  =  E(e{)  -  0.  Hence  show  that  pr  =  0  for  all  t. 

(b)  Similarly,  show  that  Var(Y.)  =  Var(Yt_  .)  +  g?  for  t  >  1  with  Vhr(Fi)  =  a? 
and  hence  Var(Yt)  =  to £  . 

(c)  For  0  <  t  <  s,  use  Ys  =  Yt  +  et  +  j  +  et  +  ?  H —  +  es  to  show  that  Cov(Lf,  Ys )  = 
Var(Yt)  and,  hence,  that  Cov(Yt,  Ys )  =  min(r,  s)a~ . 

For  a  random  walk  with  random  starting  value,  let  Yf  =  Y0  +  et  +  ef  _  l  +  ■■■  +  e1 
for  t  >  0,  where  Lq  has  a  distribution  with  mean  |i0  and  variance  .  Suppose  fur¬ 
ther  that  Fq,  <?],...,  et  are  independent. 

(a)  Show  that  E(Yt)  =  |i0  for  all  t. 

(b)  Show  that  Var(Yt )  =  ta^  +Gq  . 

(c)  Show  that  Cov(Yt,  Ys )  =  min(r,  s)o^  +  Gq  . 


(d)  Show  that  Corr(Yp  Fv)  = 


taa  +  CT0 
|.SG“  +  Gq 


for  0  <  t  <  s. 


Let  {et}  be  a  zero-mean  white  noise  process,  and  let  c  be  a  constant  with  |c[  <  1. 
Define  F.  recursively  by  F.  =  cY,  i  +  e,  with  F.  =  c\. 

(a)  Show  that  E(  Yt)  =  0. 

(b)  Show  that  Var(Yt )  =  g^  (1  +  c2  +c4  +  •  •  ■  +  c2f  “  2).  Is  { Yt]  stationary? 

(c)  Show  that 


Corr(Yt,Yt_1)  = 


\Var(Yt_  j) 
a|  Var(Yt ) 


and,  in  general. 


Var(Y  ,) 

Corr(Y. ,  Y.  .)  =  ck  - —  fork>0 

f  ‘-k  a/  Var{Yt) 

Hint:  Argue  that  Yt  _  j  is  independent  of  et.  Then  use 
Cov(Yt,  Yt_  ])  =  Cov(cYt_  ,  +  ep  Yt_]  ) 

(d)  For  large  t,  argue  that 

G? 

Var(Yf)  « - -  and  Corr(Yr ,  Yf_k)  «  ck 

1  -  c1 

so  that  { Yt}  could  be  called  asymptotically  stationary. 

(e)  Suppose  now  that  we  alter  the  initial  condition  and  put  Fj 
that  now  [Yt]  is  stationary. 


for  k  >  0 


Jl  -  c2 


.  Show 
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2.23  Two  processes  {Zt}  and  {  Y, }  are  said  to  be  independent  if  for  any  time  points  t\, 
to,--,  tm  and  si,  So,  —  ,  sn  the  random  variables  {Z.  ,  Z.  ,  ....  Z.  }  are  independent 
of  the  random  variables  {F  Y  ,  ...,  Ys  }.  Show  that  if  {Zt}  and  {Yr}  are  inde¬ 
pendent  stationary  processes,  then  Wt  =  Zt+  Yt  is  stationary. 

2.24  Let  \Xf  )  be  a  time  series  in  which  we  are  interested.  However,  because  the  mea¬ 
surement  process  itself  is  not  perfect,  we  actually  observe  Yt  =  Xt  +  et.  We  assume 
that  {X,}  and  {et}  are  independent  processes.  We  call  Xt  the  signal  and  et  the 
measurement  noise  or  error  process. 

If  {Xf  \  is  stationary  with  autocorrelation  function  p^,,  show  that  {Tf}  is  also  sta¬ 
tionary  with 

Corr(  Yp  Y  k)  =  - for  k  >  1 

1  +  CTe /°1 

We  call  g^/ct“  the  signal-to-noise  ratio,  or  SNR.  Note  that  the  larger  the  SNR, 
the  closer  the  autocorrelation  function  of  the  observed  process  { Yt]  is  to  the  auto¬ 
correlation  function  of  the  desired  signal  {Xt}. 

2.25  Suppose  Ff  =  P0  +  £  [A -cos(2jt/.r)  +  B(sin(2jt/.f)],  where  Po/l’/2>— >/&  are 

i  =  1 

constants  and  Aj,  Ao,  —  ,Ak,  li\.  Bo,-.,  Bk  are  independent  random  variables  with 
zero  means  and  variances  Var(Aj)  =  Var(Bj)  =  of .  Show  that  {Yt}  is  stationary 
and  find  its  covariance  function. 

2.26  Define  the  function  L;  s  =  ]-E\ ( Yf  -  Ys)2]  .  In  geostatistics,  L;  v  is  called  the 
semivariogram. 

(a)  Show  that  for  a  stationary  process  T;  s  =  yQ  -  y^t_  s  . 

(b) A  process  is  said  to  be  intrinsically  stationary  if  L,  s  depends  only  on  the  time 
difference  \  t  -  s|.  Show  that  the  random  walk  process  is  intrinsically  station¬ 
ary. 

2.27  For  a  fixed,  positive  integer  r  and  constant  <|),  consider  the  time  series  defined  by 
Y{  =  et  +  §e{_  j  +  §2et_2  +  •••  +  Yet-r- 

(a)  Show  that  this  process  is  stationary  for  any  value  of  4). 

(b)  Find  the  autocorrelation  function. 

2.28  (Random  cosine  wave  extended)  Suppose  that 

Yt  =  Rcos(2n(ft  +<t>))  fort  =  0,  ±1,  ±2, ... 

where  0  </<  Vi  is  a  fixed  frequency  and  R  and  O  are  uncorrelated  random  vari¬ 
ables  and  with  ®  uniformly  distributed  on  the  interval  (0,1). 

(a)  Show  that  E(Yt)  =  0  for  all  t. 

(b)  Show  that  the  process  is  stationary  with  yk  =  - E(R2) cos (2nfk). 

Hint:  Use  the  calculations  leading  up  to  Equation  (2.3.4),  on  page  19. 
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2.29  (Random  cosine  wave  extended  further)  Suppose  that 

m 

Yt  =  £  Rjcos[2n(fjt  +  Oj)]  for  t  =  0,  ±1,  ±2,  ... 

7  =  1 

where  0  <f\  </2  <  •••  <fm  <  Vi  are  m  fixed  frequencies,  and  R\,  4/,  R9,  <t>9,..., 
Rllr  <f>„,  are  uncorrelated  random  variables  with  each  <t>;-  uniformly  distributed  on 
the  interval  (0,1). 

(a)  Show  that  E(Yt )  =  0  for  all  t.  m 

(b)  Show  that  the  process  is  stationary  with  yk  =  -  ^  E(Rj)cos(2nfjk)  . 

Hint:  Do  Exercise  2.28  first.  j  =  1 

2.30  (Mathematical  statistics  required)  Suppose  that 

Yf  =  Rcos[2jt(/f  +  <t>)]  fort  =  0,  ±1,  ±2,  ... 

where  R  and  <t>  are  independent  random  variables  and /is  a  fixed  frequency.  The 
phase  is  assumed  to  be  uniformly  distributed  on  (0, 1),  and  the  amplitude  R  has 
a  Rayleigh  distribution  with  pdf  /(r)  =  re' r  2  for  r  >  0.  Show  that  for  each 
time  point  t,  Yt  has  a  normal  distribution.  (Hint:  Let  Y  =  Rons \2n(  ft  +  <t>)]  and 
X  =  Rsin[2it(/f  +  <D)].  Now  find  the  joint  distribution  of  X  and  Y.  It  can  also  be 
shown  that  all  of  the  finite  dimensional  distributions  are  multivariate  normal  and 
hence  the  process  is  strictly  stationary.) 


Appendix  A:  Expectation,  Variance,  Covariance, 
and  Correlation 


In  this  appendix,  we  define  expectation  for  continuous  random  variables.  However,  all 
of  the  properties  described  hold  for  all  types  of  random  variables,  discrete,  continuous, 
or  otherwise.  Let  X  have  probability  density  function  fix)  and  let  the  pair  (X,Y)  have 
joint  probability  density  function  f(x,y).  ^ 

The  expected  value  of  X  is  defined  as  E(X)  =  [  xf(.x)dx . 

—OO 

f00 

(If  \x\f(x)dx  <  oo  ;  otherwise  E(X)  is  undefined.)  E(X)  is  also  called  the  expectation 

— OO 

of  X  or  the  mean  of  X  and  is  often  denoted  u  or 

Properties  of  Expectation 

°° 

If  h(x)  is  a  function  such  that  I  \h(x)\f(x)dx  <  oo ,  it  may  be  shown  that 

— OO 

r00 

£[/i(X)]  =  [  h(x)f(x)dx 

—CO 

oo  oo 

Similarly,  if  [  [  \h(x,y)\f(x,  y)dxdy  <  °o  ,  it  may  be  shown  that 

—GO  —GO 
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E[h(X,Y)]  =  f  f  h(x,  y)f(x,  y)dxdy  (2.A.1) 

—00  —00 

As  a  corollary  to  Equation  (2.A.1),  we  easily  obtain  the  important  result 

E(aX  +  bY+c)  =  aE(X)  +  bE(Y)  +  c  (2.A.2) 

We  also  have 

GO  GO 

E(XY)  =J  j  xyf(x,  y)dxdy  (2.A.3) 

—GO  —00 

The  variance  of  a  random  variable  X  is  defined  as 

Vcir(X)  =  E{[X-E(X)]2}  (2.A.4) 

(provided  E(X2)  exists).  The  variance  of  X  is  often  denoted  by  a2  or  a2. 

Properties  of  Variance 

Var(X)  >  0  (2.A.5) 

Var(a  +  bX)  =  b2Var(X )  (2. A. 6) 

If  X  and  Y  are  independent,  then 

Var(X  +  Y)  =  Var(X)  +  Var(Y)  (2.A.7) 

In  general,  it  may  be  shown  that 

Var{X )  =  E(X2)-[E(X)]2  (2.A.8) 


The  positive  square  root  of  the  variance  of  X  is  called  the  standard  deviation  of  X  and 
is  often  denoted  by  a  or  ox-  The  random  variable  (X  -  \ix)l<3x  is  called  the  standard¬ 
ized  version  of  X.  The  mean  and  standard  deviation  of  a  standardized  variable  are 
always  zero  and  one,  respectively. 

The  covariance  of  X  and  Y is  defined  as  Cov(X,  Y)  =  E\ ( X  -  p X)(Y -  py)]  . 


Properties  of  Covariance 

Cov(a  +  bX,  c  +  dY)  =  bdCov{X,Y)  (2. A. 9) 

Var{X+Y)  =  Var(X)  +  Var(Y)  +  2Cov(X,  Y)  (2.A.10) 

Cov{X  +Y,Z)  =  Cov(X,  Z)  +  Cov(  Y,  Z)  (2. A.  1 1) 

Cov{X,X )  =  Var{X)  (2.A.12) 

Cov(X,Y)  =  Cov(Y,X)  (2.A.13) 

If  X  and  Y  are  independent. 


Cov(X,  Y)  =  0 


(2.A.14) 
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The  correlation  coefficient  of  X  and  K  denoted  by  Corr(X ,  Y)  or  p,  is  defined  as 

n  rv  Cov(X,  Y ) 

p  =  Corr(X ,  Y)  =  - — — L - 

JVar(X)  Var(  Y) 

Alternatively,  if  X*  is  a  standardized  X  and  Y*  is  a  standardized  K  then  p  =  E(X*Y*). 


Properties  of  Correlation 

-1  <  Corr(X,  Y)  <  1 

Corr{a  +  bX,  c  +  dY )  =  sign(bd)Corr(X,  Y) 


where  sign(bd)  =  t 


1  if  bd>0 
0  if  bd  =  0 
-1  if  bd<0 


(2.A.15) 


(2.A.16) 


Corr(X,  Y)  =  ±1  if  and  only  if  there  are  constants  a  and  b  such  that  Pr(  Y  =  a  +  bX)  =  1. 


Chapter  3 

Trends 


In  a  general  time  series,  the  mean  function  is  a  totally  arbitrary  function  of  time.  In  a  sta¬ 
tionary  time  series,  the  mean  function  must  be  constant  in  time.  Frequently  we  need  to 
take  the  middle  ground  and  consider  mean  functions  that  are  relatively  simple  (but  not 
constant)  functions  of  time.  These  trends  are  considered  in  this  chapter. 

3.1  Deterministic  Versus  Stochastic  Trends 


“Trends”  can  be  quite  elusive.  The  same  time  series  may  be  viewed  quite  differently  by 
different  analysts.  The  simulated  random  walk  shown  in  Exhibit  2.1  might  be  consid¬ 
ered  to  display  a  general  upward  trend.  However,  we  know  that  the  random  walk  process 
has  zero  mean  for  all  time.  The  perceived  trend  is  just  an  artifact  of  the  strong  positive 
correlation  between  the  series  values  at  nearby  time  points  and  the  increasing  variance 
in  the  process  as  time  goes  by.  A  second  and  third  simulation  of  exactly  the  same  pro¬ 
cess  might  well  show  completely  different  “trends.”  We  ask  you  to  produce  some  addi¬ 
tional  simulations  in  the  exercises.  Some  authors  have  described  such  trends  as 
stochastic  trends  (see  Box,  Jenkins,  and  Reinsel,  1994),  although  there  is  no  generally 
accepted  definition  of  a  stochastic  trend. 

The  average  monthly  temperature  series  plotted  in  Exhibit  1 .7  on  page  6,  shows  a 
cyclical  or  seasonal  trend,  but  here  the  reason  for  the  trend  is  clear — the  Northern 
Hemisphere’s  changing  inclination  toward  the  sun.  In  this  case,  a  possible  model  might 
be  Yt  =  \\.t  +  Xt,  where  ji,  is  a  deterministic  function  that  is  periodic  with  period  12;  that 
is  pf,  should  satisfy 

pr  =  (r  j2  for  all  t 

We  might  assume  that  Xt,  the  unobserved  variation  around  has  zero  mean  for  all  t  so 
that  indeed  li;  is  the  mean  function  for  the  observed  series  Yt.  We  could  describe  this 
model  as  having  a  deterministic  trend  as  opposed  to  the  stochastic  trend  considered 
earlier.  In  other  situations  we  might  hypothesize  a  deterministic  trend  that  is  linear  in 
time  (that  is,  ji,  =  P0  +  Pjt)  or  perhaps  a  quadratic  time  trend,  \it  =  P0  +  [3 jt  +  p -,r.  Note 
that  an  implication  of  the  model  Yt  =  +  Xr  with  E{Xt)  =  0  for  all  t  is  that  the  determin¬ 

istic  trend  applies  for  all  time.  Thus,  if  pr  =  P0  +  Pjf,  we  are  assuming  that  the  same 
linear  time  trend  applies  forever.  We  should  therefore  have  good  reasons  for  assuming 
such  a  model — not  just  because  the  series  looks  somewhat  linear  over  the  time  period 
observed. 
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In  this  chapter,  we  consider  methods  for  modeling  deterministic  trends.  Stochastic 
trends  will  be  discussed  in  Chapter  5,  and  stochastic  seasonal  models  will  be  discussed 
in  Chapter  10.  Many  authors  use  the  word  trend  only  for  a  slowly  changing  mean  func¬ 
tion,  such  as  a  linear  time  trend,  and  use  the  term  seasonal  component  for  a  mean  func¬ 
tion  that  varies  cyclically.  We  do  not  find  it  useful  to  make  such  distinctions  here. 

3.2  Estimation  of  a  Constant  Mean 


We  first  consider  the  simple  situation  where  a  constant  mean  function  is  assumed.  Our 
model  may  then  be  written  as 

Yt  =  \x  +  Xt  (3.2.1) 


where  Yk{Xt)  =  0  for  all  t.  We  wish  to  estimate  p  with  our  observed  time  series  Y\,  Y2,..., 
Yn.  The  most  common  estimate  of  p  is  the  sample  mean  or  average  defined  as 

1  " 

Y  =  -  y  Y,  (3.2.2) 

nt=l 

Under  the  minimal  assumptions  of  Equation  (3.2.1),  we  see  that  E(T )  =  p;  there¬ 
fore  Y  is  an  unbiased  estimate  of  p.  To  investigate  the  precision  of  Y  as  an  estimate  of 
p,  we  need  to  make  further  assumptions  concerning  Xt. 

Suppose  that  {Yt },  (or,  equivalently,  {Xr}  of  Equation  (3.2.1))  is  a  stationary  time 
series  with  autocorrelation  function  p/{.  Then,  by  Exercise  2.17,  we  have 


Var(Y)  = 


Yo 

n 


n-  1 

1  +  2I 

k  =  1 


p  k 


(3.2.3) 


Notice  that  the  first  factor,  y q/h,  is  the  process  (population)  variance  divided  by  the  sam¬ 
ple  size — a  concept  with  which  we  are  familiar  in  simpler  random  sampling  contexts.  If 
the  series  {X,}  of  Equation  (3.2.1)  is  just  white  noise,  then  =  0  for  k  >  0  and  Var(Y) 
reduces  to  simply  y q/w. 

In  the  (stationary)  moving  average  model  Yt  =  et  -  Viet_  j,  we  find  that  pj  =  -0.4 
and  p^  =  0  for  k  >  1 .  In  this  case,  we  have 


Var(Y)  =  - 
n 


1  +2(1  -i)  (-0.4) 


Yo 


1-0.1 


n  -  1 


For  values  of  n  usually  occurring  in  time  series  (n  >  50,  say),  the  factor  (n  -  1  )/n 
will  be  close  to  1,  so  that  we  have 
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Var(Y)  »  0.2— 
n 

We  see  that  the  negative  correlation  at  lag  1  has  improved  the  estimation  of  the  mean 
compared  with  the  estimation  obtained  in  the  white  noise  (random  sample)  situation. 
Because  the  series  tends  to  oscillate  back  and  forth  across  the  mean,  the  sample  mean 
obtained  is  more  precise. 

On  the  other  hand,  if  p/.  >  0  for  all  k>  1,  we  see  from  Equation  (3.2.3)  that  Var(Y) 
will  be  larger  than  y0/n.  Here  the  positive  correlations  make  estimation  of  the  mean  more 
difficult  than  in  the  white  noise  case.  In  general,  some  correlations  will  be  positive  and 
some  negative,  and  Equation  (3.2.3)  must  be  used  to  assess  the  total  effect. 

For  many  stationary  processes,  the  autocorrelation  function  decays  quickly  enough 
with  increasing  lags  that 

00 

Z  N<°°  (3-2-4) 

k  =  0 

(The  random  cosine  wave  of  Chapter  2  is  an  exception.) 

Under  assumption  (3.2.4)  and  given  a  large  sample  size  n,  the  following  useful 
approximation  follows  from  Equation  (3.2.3)  (See  Anderson,  1971,  p.  459,  for  example) 


Var(Y) 


Vo 

n 


00 


Z  Pa- 


for  large  n 


(3.2.5) 


Notice  that  to  this  approximation  the  variance  is  inversely  proportional  to  the  sample 
size  n. 

As  an  example,  suppose  that  =  4>^  for  all  k,  where  4)  is  a  number  strictly  between 
-1  and  +1.  Summing  a  geometric  series  yields 

Vflr(F)«(-[±^Z  (3.2.6) 

(1  -4))  n 

For  a  nonstationary  process  (but  with  a  constant  mean),  the  precision  of  the  sample 
mean  as  an  estimate  of  ti  can  be  strikingly  different.  As  a  useful  example,  suppose  that 
in  Equation  (3.2.1)  {Xf}  is  a  random  walk  process  as  described  in  Chapter  2.  Then 
directly  from  Equation  (2.2.8)  we  have 


Var(Y)  =  — Var 
n- 


Z  r* 
1=1 


=  -r  Var 


Z  Z  ej 

l  =  lj  =  1 
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-Var(e l  +  2e0  +  3e3  +  — t-  nen) 


_  ^  y  k2 
n2  k=  1 


so  that 

Var(Y)  =  ct3(2«+  1)(”  +  1}  (3.2.7) 

6h 

Notice  that  in  this  special  case  the  variance  of  our  estimate  of  the  mean  actually 
increases  as  the  sample  size  n  increases.  Clearly  this  is  unacceptable,  and  we  need  to 
consider  other  estimation  techniques  for  nonstationary  series. 

3.3  Regression  Methods 


The  classical  statistical  method  of  regression  analysis  may  be  readily  used  to  estimate 
the  parameters  of  common  nonconstant  mean  trend  models.  We  shall  consider  the  most 
useful  ones:  linear,  quadratic,  seasonal  means,  and  cosine  trends. 


Linear  and  Quadratic  Trends  in  Time 

Consider  the  deterministic  time  trend  expressed  as 


^r  =  Po  +  Pl*  (3-3-D 

where  the  slope  and  intercept,  |i  |  and  P()  respectively,  are  unknown  parameters.  The 
classical  least  squares  (or  regression)  method  is  to  choose  as  estimates  of  p  |  and  p0  val¬ 
ues  that  minimize 

G(P„,Pi)  =  £[V(P0  +  PiO]2 

r  =  1 


The  solution  may  be  obtained  in  several  ways,  for  example,  by  computing  the  partial 
derivatives  with  respect  to  both  P’s,  setting  the  results  equal  to  zero,  and  solving  the 
resulting  linear  equations  for  the  P’s.  Denoting  the  solutions  by  p0  and  Pi  ,  we  find  that 


Y(Yt-Y)(t-t) 
Pi  =  Lii - 

11  n  _ 

£(r-02 
t=  1 

A  _  A  _ 

P0  =  r-Pu 


(3.3.2) 


where  t  =  (n  +  l)/2  is  the  average  of  1,  2,...,  n.  These  formulas  can  be  simplified  some¬ 
what,  and  various  versions  of  the  formulas  are  well-known.  However,  we  assume  that 
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the  computations  will  be  done  by  statistical  software  and  we  will  not  pursue  other 
expressions  for  P()  and  p  [  here. 

Example 

Consider  the  random  walk  process  that  was  shown  in  Exhibit  2.1.  Suppose  we  (mistak¬ 
enly)  treat  this  as  a  linear  time  trend  and  estimate  the  slope  and  intercept  by 
least-squares  regression.  Using  statistical  software  we  obtain  Exhibit  3.1. 


Exhibit  3.1  Least  Squares  Regression  Estimates  for  Linear  Time  Trend 

Estimate  Std.  Error  rvalue  P/f>|f|) 

Intercept  -1.008  0.2972  -3.39  0.00126 

Time  0.1341  0.00848  15.82  <0.0001 


>  data(rwalk) 

>  modell=lm (rwalk-time (rwalk) ) 

>  summary (model 1) 

A  A 

So  here  the  estimated  slope  and  intercept  are  P  j  =  0.1341  and  P()  =  -1.008,  respec¬ 
tively.  Exhibit  3.2  displays  the  random  walk  with  the  least  squares  regression  trend  line 
superimposed.  We  will  interpret  more  of  the  regression  output  later  in  Section  3.5  on 
page  40  and  see  that  fitting  a  line  to  these  data  is  not  appropriate. 


Exhibit  3.2  Random  Walk  with  Linear  Time  Trend 


>  win . graph (width=4 . 875 ,  height=2 . 5 , pointsize=8 ) 

>  plot (rwalk, type= ' o ylab= ' y ' ) 

>  abline (modell )  #  add  the  fitted  least  squares  line  from  modell 
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Cyclical  or  Seasonal  Trends 

Consider  now  modeling  and  estimating  seasonal  trends,  such  as  for  the  average  monthly 
temperature  data  in  Exhibit  1.7.  Here  we  assume  that  the  observed  series  can  be  repre¬ 
sented  as 

Y,  =  \xt  +  Xt 

where  E(Xt )  =  0  for  all  t. 

The  most  general  assumption  for  p,  with  monthly  seasonal  data  is  that  there  are  12 
constants  (parameters),  Pj,  P2,...,  and  p]2,  giving  the  expected  average  temperature  for 
each  of  the  12  months.  We  may  write 


P,  for  f  =  1,  13,  25, ... 

P2  for  t  =  2,  14,  26, ... 

P12  for  t  =12,24,36,... 


This  is  sometimes  called  a  seasonal  means  model. 

As  an  example  of  this  model  consider  the  average  monthly  temperature  data  shown 
in  Exhibit  1.7  on  page  6.  To  fit  such  a  model,  we  need  to  set  up  indicator  variables 
(sometimes  called  dummy  variables)  that  indicate  the  month  to  which  each  of  the  data 
points  pertains.  The  procedure  for  doing  this  will  depend  on  the  particular  statistical 
software  that  you  use.  We  also  need  to  note  that  the  model  as  stated  does  not  contain  an 
intercept  term,  and  the  software  will  need  to  know  this  also.  Alternatively,  we  could  use 
an  intercept  and  leave  out  any  one  of  the  P’s  in  Equation  (3.3.3). 

Exhibit  3.3  displays  the  results  of  fitting  the  seasonal  means  model  to  the  tempera¬ 
ture  data.  Here  the  t- values  and  /V(>|f|)-values  reported  are  of  little  interest  since  they 
relate  to  testing  the  null  hypotheses  that  the  P’s  are  zero — not  an  interesting  hypothesis 
in  this  case. 


Exhibit  3.3  Regression  Results  for  the  Seasonal  Means  Model 


Estimate 

Std.  Error 

f-value 

P>i>  1*1) 

January 

16.608 

0.987 

16.8 

<  0.0001 

February 

20.650 

0.987 

20.9 

<0.0001 

March 

32.475 

0.987 

32.9 

<0.0001 

April 

46.525 

0.987 

47.1 

<  0.0001 

May 

58.092 

0.987 

58.9 

<0.0001 

June 

67.500 

0.987 

68.4 

<0.0001 

July 

71.717 

0.987 

72.7 

<0.0001 
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Estimate 

Std.  Error 

f-value 

P*>  1*1) 

August 

69.333 

0.987 

70.2 

<0.0001 

September 

61.025 

0.987 

61.8 

<0.0001 

October 

50.975 

0.987 

51.6 

<0.0001 

November 

36.650 

0.987 

37.1 

<0.0001 

December 

23.642 

0.987 

24.0 

<0.0001 

>  data (tempdub) 

>  month. = season (tempdub)  #  period  added  to  improve  table  display 

>  model2=lm (tempdub-month . -1)  #  -1  removes  the  intercept  term 

>  summary (model 2 ) 

Exhibit  3.4  shows  how  the  results  change  when  we  fit  a  model  with  an  intercept 
term.  The  software  omits  the  January  coefficient  in  this  case.  Now  the  February  coeffi¬ 
cient  is  interpreted  as  the  difference  between  February  and  January  average  tempera¬ 
tures,  the  March  coefficient  is  the  difference  between  March  and  January  average 
temperatures,  and  so  forth.  Once  more,  the  t- values  and  /V(>|/| )  (/rvalues )  are  testing 
hypotheses  of  little  interest  in  this  case.  Notice  that  the  Intercept  coefficient  plus  the 
February  coefficient  here  equals  the  February  coefficient  displayed  in  Exhibit  3.3. 

Exhibit  3.4  Results  for  Seasonal  Means  Model  with  an  Intercept 


Estimate 

Std.  Error 

f-value 

P*>  1*1) 

Intercept 

16.608 

0.987 

16.83 

<  0.0001 

February 

4.042 

1.396 

2.90 

0.00443 

March 

15.867 

1.396 

11.37 

<  0.0001 

April 

29.917 

1.396 

21.43 

<  0.0001 

May 

41.483 

1.396 

29.72 

<  0.0001 

June 

50.892 

1.396 

36.46 

<  0.0001 

July 

55.108 

1.396 

39.48 

<  0.0001 

August 

52.725 

1.396 

37.78 

<  0.0001 

September 

44.417 

1.396 

31.82 

<  0.0001 

October 

34.367 

1.396 

24.62 

<  0.0001 

November 

20.042 

1.396 

14.36 

<  0.0001 

December 

7.033 

1.396 

5.04 

<0.0001 

>  model3=lm (tempdub-month . )  #  January  is  dropped  automatically 

>  summary (model 3 ) 
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Cosine  Trends 

The  seasonal  means  model  for  monthly  data  consists  of  12  independent  parameters  and 
does  not  take  the  shape  of  the  seasonal  trend  into  account  at  all.  For  example,  the  fact 
that  the  March  and  April  means  are  quite  similar  (and  different  from  the  June  and  July 
means)  is  not  reflected  in  the  model.  In  some  cases,  seasonal  trends  can  be  modeled  eco¬ 
nomically  with  cosine  curves  that  incorporate  the  smooth  change  expected  from  one 
time  period  to  the  next  while  still  preserving  the  seasonality. 

Consider  the  cosine  curve  with  equation 

=  Pcos(2n/r  +  <t>)  (3.3.4) 

We  call  p  (>  0)  the  amplitude,  f  the  frequency,  and  <J>  the  phase  of  the  curve.  As  t  varies, 
the  curve  oscillates  between  a  maximum  of  p  and  a  minimum  of  -p.  Since  the  curve 
repeats  itself  exactly  every  l//time  units,  l/f  is  called  the  period  of  the  cosine  wave.  As 
noted  in  Chapter  2,  ®  serves  to  set  the  arbitrary  origin  on  the  time  axis.  For  monthly 
data  with  time  indexed  as  1,2,...,  the  most  important  frequency  is/=  1/12,  because  such 
a  cosine  wave  will  repeat  itself  every  12  months.  We  say  that  the  period  is  12. 

Equation  (3.3.4)  is  inconvenient  for  estimation  because  the  parameters  P  and  ®  do 
not  enter  the  expression  linearly.  Fortunately,  a  trigonometric  identity  is  available  that 
reparameterizes  (3.3.4)  more  conveniently,  namely 

Pcos(2ii/r  +  ®)  =  PjCos(2n:/r)  +  p2sin(27i/r)  (3.3.5) 

where 

P  =  VPi+Pl-  ®  =  atan(-p2/p1)  (3.3.6) 

and,  conversely, 

Pj  =  Pcos(O),  P2  =  Psin(O)  (3.3.7) 

To  estimate  the  parameters  pt  and  P2  with  regression  techniques,  we  simply  use 
cos(2ji/f)  and  sin(2ji/i)  as  regressors  or  predictor  variables. 

The  simplest  such  model  for  the  trend  would  be  expressed  as 

p,  =  Po  +  PiC°s(27i/l)  +  p9sin(27i/f)  (3.3.8) 

Here  the  constant  term,  p(),  can  be  meaningfully  thought  of  as  a  cosine  with  frequency 
zero. 

In  any  practical  example,  we  must  be  careful  how  we  measure  time,  as  our  choice 
of  time  measurement  will  affect  the  values  of  the  frequencies  of  interest.  For  example,  if 
we  have  monthly  data  but  use  1,  2,  3,...  as  our  time  scale,  then  1/12  would  be  the  most 
interesting  frequency,  with  a  corresponding  period  of  12  months.  However,  if  we  mea¬ 
sure  time  by  year  and  fractional  year,  say  1980  for  January,  1980.08333  for  February  of 
1980,  and  so  forth,  then  a  frequency  of  1  corresponds  to  an  annual  or  12  month  periodic¬ 
ity. 

Exhibit  3.5  is  an  example  of  fitting  a  cosine  curve  at  the  fundamental  frequency  to 
the  average  monthly  temperature  series. 
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Exhibit  3.5  Cosine  Trend  Model  for  Temperature  Series 


Coefficient 

Estimate 

Std.  Error 

f-value 

Ptf>|f|) 

Intercept 

46.2660 

0.3088 

149.82 

<0.0001 

cos(2jrf) 

-26.7079 

0.4367 

-61.15 

<0.0001 

sin(2jif) 

-2.1697 

0.4367 

-4.97 

<0.0001 

>  har . =harmonic ( tempdub , 1 ) 

>  model4=lm (tempdub~har . ) 

>  summary (model4 ) 

In  this  output,  time  is  measured  in  years,  with  1964  as  the  starting  value  and  a  fre¬ 
quency  of  1  per  year.  A  graph  of  the  time  series  values  together  with  the  fitted  cosine 
curve  is  shown  in  Exhibit  3.6.  The  trend  fits  the  data  quite  well  with  the  exception  of 
most  of  the  January  values,  where  the  observations  are  lower  than  the  model  would  pre¬ 
dict. 


Exhibit  3.6  Cosine  Trend  for  the  Temperature  Series 


1 964  1 966  1 968  1 970  1 972  1 974  1 976 


Time 

>  win . graph (width=4 . 875 ,  height=2 . 5 , pointsize=8 ) 

>  plot (ts (fitted (model4) , freq=12 , start  =  c (1964,1)  )  , 

ylab= ' Temperature ' , type= 1 1 ' , 

>  ylim=range ( c ( fitted (model4 ), tempdub) )) ;  points ( tempdub) 

>  #  ylim  ensures  that  the  y  axis  range  fits  the  raw  data  and  the 

fitted  values 

Additional  cosine  functions  at  other  frequencies  will  frequently  be  used  to  model 
cyclical  trends.  For  monthly  series,  the  higher  harmonic  frequencies,  such  as  2/12  and 
3/12,  are  especially  pertinent  and  will  sometimes  improve  the  fit  at  the  expense  of  add- 
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ing  more  parameters  to  the  model.  In  fact,  it  may  be  shown  that  any  periodic  trend  with 
period  12  may  be  expressed  exactly  by  the  sum  of  six  pairs  of  cosine-sine  functions. 
These  ideas  are  discussed  in  detail  in  Fourier  analysis  or  spectral  analysis.  We  pursue 
these  ideas  further  in  Chapters  13  and  14. 

3.4  Reliability  and  Efficiency  of  Regression  Estimates 


We  assume  that  the  series  is  represented  as  Yt  =  p,  +  Xt.  where  ll,  is  a  deterministic  trend 
of  the  kind  considered  above  and  { Xt }  is  a  zero-mean  stationary  process  with  autocova¬ 
riance  and  autocorrelation  functions  yk  and  p/{,  respectively.  Ordinary  regression  esti¬ 
mates  parameters  in  a  linear  model  according  to  the  criterion  of  least  squares  regardless 
of  whether  we  are  fitting  linear  time  trends,  seasonal  means,  cosine  curves,  or  whatever. 

We  first  consider  the  easiest  case — the  seasonal  means.  As  mentioned  earlier,  the 
least  squares  estimates  of  the  seasonal  means  are  just  seasonal  averages;  thus,  if  we  have 
N  (complete)  years  of  monthly  data,  we  can  write  the  estimate  for  the  mean  for  the  /  th 
season  as 

A  1  N-  1 

=  N  X  Y}  +  12  i 

i  =  0 

A  _ 

Since  Py  is  an  average  like  Y  but  uses  only  every  12th  observation,  Equation 
(3.2.3)  can  be  easily  modified  to  give  Var(  Py) .  We  replace  n  by  N  (years)  and  by  p  12£ 
to  get 


A  Yn 

=  S 


N-  1  / 

1+2  x  C1 

k  =  1 


N' 


P  ilk 


for  j  =  1,  2, ...,  12 


(3.4.1) 


We  notice  that  if  {XJ  is  white  noise,  then  Var(fij)  reduces  to  y 0/N,  as^expected.  Fur¬ 
thermore,  if  several  pj.  are  nonzero  but  p \2k  =  0,  then  we  still  have  Var(  Py)  =  y q/N.  In 
any  case,  only  the  seasonal  autocorrelations,  p^,  P24,  P36>---’  enter  into  Equation 
(3.4.1).  Since  N  will  rarely  be  very  large  (except  perhaps  for  quarterly  data),  approxima¬ 
tions  like  those  shown  in  Equation  (3.2.5)  will  usually  not  be  useful. 

We  turn  now  to  the  cosine  trends  expressed  as  in  Equation  (3.3.8).  For  any  fre¬ 
quency  of  the  form /=  m/n ,  where  m  is  an  integer  satisfying  1  <  m  <  n/2,  explicit  expres¬ 
sions  are  available  for  the  estimates  P  |  and  P2,  the  amplitudes  of  the  cosine  and  sine: 


2  JL 


Pi  =  -y 


t  =  1 


cos  - 


(2nmt\ 


■  Y, 


v  n  J 


a  9  n 

p2  =  -  v 

n 


t=  1 


.  (2nmi\ 
sin  1 


:  Y. 


\  n  y 


(3.4.2) 


(These  are  effectively  the  correlations  between  the  time  series  { Yt}  and  the  cosine  and 
sine  waves  with  frequency  m/n.) 

Because  these  are  linear  functions  of  { Yt },  we  may  evaluate  their  variances  using 
Equation  (2.2.6).  We  find 
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a  2yn 

Var{  PO  = 

n 


1  +  ■ 


Z  Z 

s=2t=l 


V-t  (2nmi\  (2nms\ 


cos 


cos  - 

V  n  J  v  n 


-JP, 


(3.4.3) 


71  9 

where  we  have  used  the  fact  that  V  [cos{2nmt  /  ri)Y  =  n/2.  However,  the  double 

^  t  =  1 

sum  in  Equation  (3.4.3)  does  not,  in  general,  reduce  further.  A  similar  expression  holds 
for  Var(  p2)  if  we  replace  the  cosines  by  sines. 

If  {A,}  is  white  noise,  we  get  just  2yo/«.  If  pj  0,  =  0  for  k  >  1,  and  min  =  1/12, 

then  the  variance  reduces  to 


Var(  Pj) 


2Yo 


4pj  n ■  l  / 7i A  fnt  +  l 

1  h - >  cos  —  cos 

a  ^ 


t  =  l 


v  6 


(3.4.4) 


To  illustrate  the  effect  of  the  cosine  terms,  we  have  calculated  some  representative  val¬ 
ues: 

A 


n 


Var{  P,  ) 


If  p]  =  -0.4,  then  the  large  sample  multiplier  in  Equation  (3.4.5)  is  l  +  1.732(-0.4)  = 
0.307  and  the  variance  is  reduced  by  about  70%  when  compared  with  the  white  noise 
case. 

In  some  circumstances,  seasonal  means  and  cosine  trends  could  be  considered  as 
competing  models  for  a  cyclical  trend.  If  the  simple  cosine  model  is  an  adequate  model, 
how  much  do  we  lose  if  we  use  the  less  parsimonious  seasonal  means  model?  To 
approach  this  problem,  we  must  first  consider  how  to  compare  the  models.  The  parame¬ 
ters  themselves  are  not  directly  comparable,  but  we  can  compare  the  estimates  of  the 
trend  at  comparable  time  points. 

Consider  the  two  estimates  for  the  trend  in  January;  that  is,  pj .  With  seasonal 
means,  this  estimate  is  just  the  January  average,  which  has  variance  given  by  Equation 
(3.4.1).  With  the  cosine  trend  model,  the  corresponding  estimate  is 
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A]  =  Po  +  Pi  cos 


+  P2sin 


To  comput^  the  variance  of  this  estimate,  we  need  one  more  fact:  With  this  model,  the 
estimates  P0,  P  | ,  and  P2  are  uncorrelated.  ’  This  follows  from  the  orthogonality  rela¬ 
tionships  of  the  cosines  and  sines  involved.  See  Bloomfield  (1976)  or  Fuller  (1996)  for 
more  details.  For  the  cosine  model,  then,  we  have 


Var^)  =  Far(P0)  +  Var($]) 


(In 
A 12 


+  Var{  P2) 


-,2 


(3.4.6) 


For  our  first  comparison,  assume  that  the  stochastic  component  is  white  noise.  Then 
the  variance  of  our  estimate  in  the  seasonal  means  model  is  just  yg IN.  For  the  cosine 
model,  we  use  Equation  (3.4.6),  and  Equation  (3.4.4)  and  its  sine  equivalent,  to  obtain 


Var(  Ai)  =  1  +  2 

1  n  1 


Jo 


cos 


+  2 


Sin(s) 


since  (cos9)2  +  (sin9)2  =  1  .  Thus  the  ratio  of  the  standard  deviation  in  the  cosine 
model  to  that  in  the  seasonal  means  model  is 


/3Yq/»  =  f3N 

A/  Y0/N  'v  n 


In  particular,  for  the  monthly  temperature  series,  we  have  n  =  144  and  N  =  12;  thus,  the 
ratio  is 


=  0.5 


Thus,  in  the  cosine  model,  we  estimate  the  January  effect  with  a  standard  deviation  that 
is  only  half  as  large  as  it  would  be  if  we  estimated  with  a  seasonal  means  model — a  sub¬ 
stantial  gain.  (Of  course,  this  assumes  that  the  cosine  trend  plus  white  noise  model  is  the 
correct  model.) 

Suppose  now  that  the  stochastic  component  is  such  that  Pj  ^  0  but  p/(  =  0  for  k  >  1. 
With  a  seasonal  means  model,  the  variance  of  the  estimated  January  effect  will  be 
unchanged  (see  Equation  (3.4.1)  on  page  36).  For  the  cosine  trend  model,  if  we  have  a 
reasonably  large  sample  size,  we  may  use  Equation^. 4. 5),  an  identical  expression  for 
Var{  p2),  and  Equation  (3.2.3)  on  page  28  for  Vcir(fi0)  to  obtain 


'  This  assumes  that  1/12  is  a  “Fourier  frequency”;  that  is,  it  is  of  the  form  m/n.  Otherwise, 
these  estimates  are  only  approximately  uncorrelated. 
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If  Pl  =  -0.4,  then  we  have  0.814y0/«,  and  the  ratio  of  the  standard  deviation  in  the  cosine 
case  to  the  standard  deviation  in  the  seasonal  means  case  is 

lr(0.814y0)/7n  =  & 

a/L  Y 0/N  \  ~  V  n 

If  we  take  n  =144  and  N  =  12,  the  ratio  is 

IMS  =  0.26 

4  144 

a  very  substantial  reduction  indeed! 

We  now  turn  to  linear  tjme  trends.  For  these  trends,  an  alternative  formula  to  Equa¬ 
tion  (3.3.2)  on  page  30  for  P ,  is  more  convenient.  It  can  be  shown  that  the  least  squares 
estimate  of  the  slope  may  be  written 

£(t-oYt 

Pl  =  ^ -  (3.4.7) 

Z  «-n2 

t  =  i 

Since  the  estimate  is  a  linear  combination  of  F-values,  some  progress  can  be  made  in 
evaluating  its  variance.  We  have 


a  12yn  94  n  i-i 

VarOO  =  — 1  +  f  y  V 

«(«“-!)  n(«2- 1)^  =  2  f=  1 


(3.4.8) 


— \2 

where  we  have  used  X  l>=  »(«2  -  1  )/l 2.  Again  the  double  sum  does  not  in  gen¬ 
eral  reduce. 

To  illustrate  the  effect  of  Equation  (3.4.8),  consider  again  the  case  where  pj  *  0  but 
P£  =  0  for  k  >  1.  Then,  after  some  algebraic  manipulation,  again  involving  the  sum  of 
consecutive  integers  and  their  squares,  Equation  (3.4.8)  can  be  reduced  to 


v"<J')-^rn[,+2p'(1-D] 


For  large  n,  we  can  neglect  the  3 In  term  and  use 


Var(  PO 


I2y0(  l  +  2P|) 
n(n2  -  1) 


(3.4.9) 
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If  Pj  =  -0.4,  then  1  +  2pj  =  0.2,  and  then  the  variance  of  Pj  is  only  20%  of  what  it 
would  be  if  {Xf}  were  white  noise.  Of  course,  if  pj  >  0,  then  the  variance  would  be 
larger  than  for  the  white  noise  case. 

We  turn  now  to  comparing  the  least  squares  estimates  with  the  so-called  best  linear 
unbiased  estimates  (BLUE)  or  the  generalized  least  squares  (GLS)  estimates.  If  the 
stochastic  component  {Xt}  is  not  white  noise,  estimates  of  the  unknown  parameters  in 
the  trend  function  may  be  made;  they  are  linear  functions  of  the  data,  are  unbiased,  and 
have  the  smallest  variances  among  all  such  estimates — the  so-called  BLUE  or  GLS 
estimates.  These  estimates  and  their  variances  can  be  expressed  fairly  explicitly  by 
using  certain  matrices  and  their  inverses.  (Details  may  be  found  in  Draper  and  Smith 
(1981).)  However,  constructing  these  estimates  requires  complete  knowledge  of  the 
covariance  function  of  the  stochastic  component,  a  function  that  is  unknown  in  virtually 
all  real  applications.  It  is  possible  to  iteratively  estimate  the  covariance  function  for  {Xt} 
based  on  a  preliminary  estimate  of  the  trend.  The  trend  is  then  estimated  again  using  the 
estimated  covariance  function  for  {X,}  and  thus  iterated  to  an  approximate  BLUE  for 
the  trend.  These  methods  are  pursued  further  in  Chapter  11. 

Fortunately,  there  are  some  results  based  on  large  sample  sizes  that  support  the  use 
of  the  simpler  least  squares  estimates  for  the  types  of  trends  that  we  have  considered.  In 
particular,  we  have  the  following  result  (see  Fuller  (1996),  pp.  476-480,  for  more 
details):  We  assume  that  the  trend  is  either  a  polynomial  in  time,  a  trigonometric  poly¬ 
nomial,  seasonal  means,  or  a  linear  combination  of  these.  Then,  for  a  very  general  sta¬ 
tionary  stochastic  component  {Xt},  the  least  squares  estimates  for  the  trend  have  the 
same  variance  as  the  best  linear  unbiased  estimates  for  large  sample  sizes. 

Although  the  simple  least  squares  estimates  may  be  asymptotically  efficient,  it  does 
not  follow  that  the  estimated  standard  deviations  of  the  coefficients  as  printed  out  by  all 
regression  routines  are  correct.  We  shall  elaborate  on  this  point  in  the  next  section.  We 
also  caution  the  reader  that  the  result  above  is  restricted  to  certain  kinds  of  trends  and 
cannot,  in  general,  be  extended  to  regression  on  arbitrary  predictor  variables,  such  as 
other  time  series.  For  example,  Fuller  (1996,  pp.  518-522)  shows  that  if  Yt  =  fSZ,  +  Xt, 
where  {Xt}  has  a  simple  stochastic  structure  but  (Z, }  is  also  a  stationary  series,  then  the 
least  squares  estimate  of  (i  can  be  very  inefficient  and  biased  even  for  large  samples. 

3.5  Interpreting  Regression  Output 


We  have  already  noted  that  the  standard  regression  routines  calculate  least  squares  esti¬ 
mates  of  the  unknown  regression  coefficients — the  betas.  As  such,  the  estimates  are  rea¬ 
sonable  under  minimal  assumptions  on  the  stochastic  component  {Xt}.  However,  some 
of  the  properties  of  the  regression  output  depend  heavily  on  the  usual  regression 
assumption  that  {X,}  is  white  noise,  and  some  depend  on  the  further  assumption  that 
{Xt}  is  approximately  normally  distributed.  We  begin  with  the  items  that  depend  least 
on  the  assumptions. 

Consider  the  regression  output  shown  in  Exhibit  3.7.  We  shall  write  pf  for  the  esti¬ 
mated  trend  regardless  of  the  assumed  parametric  form  for  p(.  For  example,  for  the  lin¬ 
ear  time  trend,  we  have  p,  =  p0  +  pp.  For  each  t,  the  unobserved  stochastic  component 
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Xt  can  be  estimated  (predicted)  by  Yt  -  p..  If  the  { Xt  j  process  has  constant  variance, 
then  we  can  estimate  the  standard  deviation  of  Xt,  namely  Jjq  ,  by  the  residual  stan¬ 
dard  deviation 


s 


Z(v£f)2 


(3.5.1) 


where  p  is  the  number  of  parameters  estimated  in  p,  and  n  -  p  is  the  so-called  degrees  of 
freedom  for  s.  The  value  of  ,v  gives  an  absolute  measure  of  the  goodness  of  fit  of  the  esti¬ 
mated  trend — the  smaller  the  value  of  s,  the  better  the  fit.  However,  a  value  of  .v  of,  say, 
60.74  is  somewhat  difficult  to  interpret. 

A  unitless  measure  of  the  goodness  of  fit  of  the  trend  is  the  value  of  R 2,  also  called 
the  coefficient  of  determination  or  multiple  ^-squared.  One  interpretation  of  R~  is  that 
it  is  the  square  of  the  sample  correlation  coefficient  between  the  observed  series  and  the 
estimated  trend.  It  is  also  the  fraction  of  the  variation  in  the  series  that  is  explained  by 
the  estimated  trend.  Exhibit  3.7  is  a  more  complete  regression  output  when  fitting  the 
straight  line  to  the  random  walk  data.  This  extends  what  we  saw  in  Exhibit  3.1  on  page 
31. 


Exhibit  3.7 

Regression  Output  for  Linear  Trend  Fit  of  Random  Walk 

Estimate 

Std.  Error 

f-value 

P/t>|f|) 

Intercept 

-1.007888 

0.297245 

-3.39 

0.00126 

Time 

0.134087 

0.008475 

15.82 

<  0.0001 

Residual  standard  error 

1.137 

with  58  degrees  of  freedom 

Multiple  /?-Squared 

0.812 

Adjusted  R-squared 

0.809 

F-statistic 

250.3 

with  1  and  58  df;/>  value  <  0.0001 

■  modell=lm (rwalk-time 

(rwalk) ) 

>  summary (model 1) 


According  to  Exhibit  3.7,  about  81%  of  the  variation  in  the  random  walk  series  is 
explained  by  the  linear  time  trend.  The  adjusted  R-squared  value  is  a  small  adjustment 
to  R  that  yields  an  approximately  unbiased  estimate  based  on  the  number  of  parameters 
estimated  in  the  trend.  It  is  useful  for  comparing  models  with  different  numbers  of 
parameters.  Various  formulas  for  computing  R  may  be  found  in  any  book  on  regres¬ 
sion,  such  as  Draper  and  Smith  (1981).  The  standard  deviations  of  the  coefficients 
labeled  Std.  Error  on  the  output  need  to  be  interpreted  carefully.  They  are  appropriate 
only  when  the  stochastic  component  is  white  noise — the  usual  regression  assumption. 


42 


Trends 


For  example,  in  Exhibit  3.7  the  value  1.137  is  obtained  from  the  square  root  of  the  value 
given  by  Equation  (3.4.8)  when  =  0  for  k  >  0  and  with  y0  estimated  by  s  ,  that  is,  to 
within  rounding, 

0.008475  =  /12(L137)~ 

A/60(602-  1) 

The  important  point  is  that  these  standard  deviations  assume  a  white  noise  stochastic 
component  that  will  rarely  be  true  for  time  series. 

The  t- values  or  /-ratios  shown  in  Exhibit  3.7  are  just  the  estimated  regression  coef¬ 
ficients,  each  divided  by  their  respective  standard  errors.  If  the  stochastic  component  is 
normally  distributed  white  noise,  then  these  ratios  provide  appropriate  test  statistics  for 
checking  the  significance  of  the  regression  coefficients.  In  each  case,  the  null  hypothesis 
is  that  the  corresponding  unknown  regression  coefficient  is  zero.  The  significance  levels 
and  /;- values  are  determined  from  the  /-distribution  with  n  -  p  degrees  of  freedom. 

3.6  Residual  Analysis 


As  we  have  already  noted,  the  unobserved  stochastic  component  {Xt}  can  be  estimated, 
or  predicted,  by  the  residual 

Xt  =Yt-(it  (3.6.1) 

Predicted  is  really  a  better  term.  We  reserve  the  term  estimate  for  the  guess  of  an 
unknown  parameter  and  the  term  predictor  for  an  estimate  of  an  unobserved  random 
variable.  We  call  X  the  residual  corresponding  to  the  /th  observation.  If  the  trend  model 
is  reasonably  correct,  then  the  residuals  should  behave  roughly  like  the  true  stochastic 
component,  and  various  assumptions  about  the  stochastic  component  can  be  assessed  by 
looking  at  the  residuals.  If  the  stochastic  component  is  white  noise,  then  the  residuals 
should  behave  roughly  like  independent  (normal)  random  variables  with  zero  mean  and 
standard  deviation  s.  Since  a  least  squares  fit  of  any  trend  containing  a  constant  term 
automatically  produces  residuals  with  a  zero  mean,  we  might  consider  standardizing  the 
residuals  as  Xf/s-  However,  most  statistics  software  will  produce  standardized  residuals 
using  a  more  complicated  standard  error  in  the  denominator  that  takes  into  account  the 
specific  regression  model  being  fit. 

With  the  residuals  or  standardized  residuals  in  hand,  the  next  step  is  to  examine  var¬ 
ious  residual  plots.  We  first  look  at  the  plot  of  the  residuals  over  time.  If  the  data  are 
possibly  seasonal,  we  should  use  plotting  symbols  as  we  did  in  Exhibit  1.9  on  page  7,  so 
that  residuals  associated  with  the  same  season  can  be  identified  easily. 

We  will  use  the  monthly  average  temperature  series  which  we  fitted  with  seasonal 
means  as  our  first  example  to  illustrate  some  of  the  ideas  of  residual  analysis.  Exhibit 
1.7  on  page  6  shows  the  time  series  plot  of  that  series.  Exhibit  3.8  shows  a  time  series 
plot  for  the  standardized  residuals  of  the  monthly  temperature  data  fitted  by  seasonal 
means.  If  the  stochastic  component  is  white  noise  and  the  trend  is  adequately  modeled, 
we  would  expect  such  a  plot  to  suggest  a  rectangular  scatter  with  no  discernible  trends 
whatsoever.  There  are  no  striking  departures  from  randomness  apparent  in  this  display. 
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Exhibit  3.9  repeats  the  time  series  plot  but  now  with  seasonal  plotting  symbols.  Again 
there  are  no  apparent  patterns  relating  to  different  months  of  the  year. 
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Next  we  look  at  the  standardized  residuals  versus  the  corresponding  trend  estimate, 
or  fitted  value,  as  in  Exhibit  3.10.  Once  more  we  are  looking  for  patterns.  Are  small 
residuals  associated  with  small  fitted  trend  values  and  large  residuals  with  large  fitted 
trend  values?  Is  there  less  variation  for  residuals  associated  with  certain  sized  fitted 
trend  values  or  more  variation  with  other  fitted  trend  values?  There  is  somewhat  more 
variation  for  the  March  residuals  and  less  for  November,  but  Exhibit  3.10  certainly  does 
not  indicate  any  dramatic  patterns  that  would  cause  us  to  doubt  the  seasonal  means 
model. 


Exhibit  3.1 0  Standardized  Residuals  versus  Fitted  Values  for  the 
Temperature  Seasonal  Means  Model 


>  plot (y=r student (model3 ) , x=as .vector (fitted (model 3 ) ) , 

xlab= ' Fitted  Trend  Values ' , 

>  ylab= ' Standardized  Residuals type= ' n ' ) 

>  points (y=r student (mode 13 ) , x=as . vector (fitted (model 3 ) ) , 

pch=as .vector (season (tempdub) ) ) 


Gross  nonnormality  can  be  assessed  by  plotting  a  histogram  of  the  residuals  or  stan¬ 
dardized  residuals.  Exhibit  3.11  displays  a  frequency  histogram  of  the  standardized 
residuals  from  the  seasonal  means  model  for  the  temperature  series.  The  plot  is  some¬ 
what  symmetric  and  tails  off  at  both  the  high  and  low  ends  as  a  normal  distribution  does. 
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Exhibit  3.11  Histogram  of  Standardized  Residuals  from  Seasonal 
Means  Model 


-3-2-10  1  2  3 

Standardized  Residuals 

>  hist (rstudent (model3 ), xlab= ' Standardized  Residuals') 

Normality  can  be  checked  more  carefully  by  plotting  the  so-called  normal  scores  or 
quantile-quantile  (QQ)  plot.  Such  a  plot  displays  the  quantiles  of  the  data  versus  the  the¬ 
oretical  quantiles  of  a  normal  distribution.  With  normally  distributed  data,  the  QQ  plot 
looks  approximately  like  a  straight  line.  Exhibit  3.12  shows  the  QQ  normal  scores  plot 
for  the  standardized  residuals  from  the  seasonal  means  model  for  the  temperature  series. 
The  straight-line  pattern  here  supports  the  assumption  of  a  normally  distributed  stochas¬ 
tic  component  in  this  model. 


Exhibit  3.12  Q-Q  Plot:  Standardized  Residuals  of  Seasonal  Means  Model 


Theoretical  Quantiles 

>  win . graph (width=2 . 5 , height =2 . 5 , point size =8 ) 

>  qqnorm (rstudent (mode!3) ) 
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An  excellent  test  of  normality  is  known  as  the  Shapiro- Wilk  test. 1  It  essentially  cal¬ 
culates  the  correlation  between  the  residuals  and  the  corresponding  normal  quantiles. 
The  lower  this  correlation,  the  more  evidence  we  have  against  normality.  Applying  that 
test  to  these  residuals  gives  a  test  statistic  of  W  =  0.9929  with  a  /;- value  of  0.6954.  We 
cannot  reject  the  null  hypothesis  that  the  stochastic  component  of  this  model  is  normally 
distributed. 

Independence  in  the  stochastic  component  can  be  tested  in  several  ways.  The  runs 
test  examines  the  residuals  in  sequence  to  look  for  patterns — patterns  that  would  give 
evidence  against  independence.  Runs  above  or  below  their  median  are  counted.  A  small 
number  of  runs  would  indicate  that  neighboring  residuals  are  positively  dependent  and 
tend  to  “hang  together”  over  time.  On  the  other  hand,  too  many  runs  would  indicate  that 
the  residuals  oscillate  back  and  forth  across  their  median.  Then  neighboring  residuals 
are  negatively  dependent.  So  either  too  few  or  too  many  runs  lead  us  to  reject  indepen¬ 
dence.  Performing  a  runs  tesH  on  these  residuals  produces  the  following  values: 
observed  runs  =  65,  expected  runs  =  72.875,  which  leads  to  a  p-value  of  0.216  and  we 
cannot  reject  independence  of  the  stochastic  component  in  this  seasonal  means  model. 

The  Sample  Autocorrelation  Function 

Another  very  important  diagnostic  tool  for  examining  dependence  is  the  sample  auto¬ 
correlation  function.  Consider  any  sequence  of  data  Y j,  Yj....,  Yn — whether  residuals, 
standardized  residuals,  original  data,  or  some  transformation  of  data.  Tentatively  assum¬ 
ing  stationarity,  we  would  like  to  estimate  the  autocorrelation  function  pk  for  a  variety  of 
lags  k  =  1,  2,....  The  obvious  way  to  do  this  is  to  compute  the  sample  correlation 
between  the  pairs  k  units  apart  in  time.  That  is,  among  (T],  Tj  +k),  (T2,  ^2  +  k), 
(T3,  Y3  +  k),...,  and  (Yn_ic,  Yn).  However,  we  modify  this  slightly,  taking  into  account 
that  we  are  assuming  stationarity,  which  implies  a  common  mean  and  variance  for  the 
series.  With  this  in  mind,  we  define  the  sample  autocorrelation  function,  r k,  at  lag  k  as 

t  (Yt-Y)(Yt-k-Y) 

rk  =  r~k+l  - - = -  for  k  =  1,2, ...  (3.6.2) 

£(Tf-T)2 

t  =  1 

Notice  that  we  used  the  “grand  mean,”  Y,  in  all  places  and  have  also  divided  by  the 
“grand  sum  of  squares”  rather  than  the  product  of  the  two  separate  standard  deviations 
used  in  the  ordinary  correlation  coefficient.  We  also  note  that  the  denominator  is  a  sum 
of  n  squared  terms  while  the  numerator  contains  only  n-k  cross  products.  For  a  variety 
of  reasons,  this  has  become  the  standard  definition  for  the  sample  autocorrelation  func¬ 
tion.  A  plot  of  rk  versus  lag  k  is  often  called  a  correlogram. 


Royston.  P.  (1982)  “An  Extension  of  Shapiro  and  Wilk's  W  Test  for  Normality  to  Large 
Samples.”  Applied  Statistics ,  31,  1 15-124. 

+  R  code:  runs  ( rstudent  (mode!3 )  ) 
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In  our  present  context,  we  are  interested  in  discovering  possible  dependence  in  the 
stochastic  component;  therefore  the  sample  autocorrelation  function  for  the  standard¬ 
ized  residuals  is  of  interest.  Exhibit  3.13  displays  the  sample  autocorrelation  for  the 
standardized  residuals  from  the  seasonal  means  model  of  the  temperature  series.  All  val¬ 
ues  are  within  the  horizontal  dashed  lines,  which  are  placed  at  zero  plus  and  minus  two 
approximate  standard  errors  of  the  sample  autocorrelations,  namely  ±2/Jri .  The  values 
of  ?'[,  are,  of  course,  estimates  of  p^..  As  such,  they  have  their  own  sampling  distributions, 
standard  errors,  and  other  properties.  For  now  we  shall  use  rk  as  a  descriptive  tool  and 
defer  discussion  of  those  topics  until  Chapters  6  and  8.  According  to  Exhibit  3.13,  for  k 
=  1,  2,...,  21,  none  of  the  hypotheses  p^.  =  0  can  be  rejected  at  the  usual  significance  lev¬ 
els,  and  it  is  reasonable  to  infer  that  the  stochastic  component  of  the  series  is  white 
noise. 


Exhibit  3.13  Sample  Autocorrelation  of  Residuals  of  Seasonal  Means 
Model 
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>  win . graph (width=4 . 875 , height =3 , point size =8 ) 

>  acf (rstudent (mode!3 ) ) 


As  a  second  example  consider  the  standardized  residuals  from  fitting  a  straight  line 
to  the  random  walk  time  series.  Recall  Exhibit  3.2  on  page  31,  which  shows  the  data  and 
fitted  line,  A  time  series  plot  of  the  standardized  residuals  is  shown  in  Exhibit  3.14. 


48 


Trends 


Exhibit  3.14  Residuals  from  Straight  Line  Fit  of  the  Random  Walk 


>  plot (y=r student (model 1 ) , x=as .vector (time (rwalk) ) , 

ylab= 1  Standardized  Residuals  1 , xlab= ' Time ' , type= 'o') 


In  this  plot,  the  residuals  “hang  together”  too  much  for  white  noise — the  plot  is  too 
smooth.  Furthermore,  there  seems  to  be  more  variation  in  the  last  third  of  the  series  than 
in  the  first  two-thirds.  Exhibit  3.15  shows  a  similar  effect  with  larger  residuals  associ¬ 
ated  with  larger  fitted  values. 


Exhibit  3.15  Residuals  versus  Fitted  Values  from  Straight  Line  Fit 


>  win . graph (width=4 . 875 ,  height=3 , pointsize=8 ) 

>  plot (y=rstudent (modell ) , x=f itted (modell ) , 

ylab= ' Standardized  Residuals ', xlab= ' Fitted  Trend  Line  Values', 
type= ' p ' ) 
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The  sample  autocorrelation  function  of  the  standardized  residuals,  shown  in  Exhibit 
3.16,  confirms  the  smoothness  of  the  time  series  plot  that  we  observed  in  Exhibit  3.14. 
The  lag  1  and  lag  2  autocorrelations  exceed  two  standard  errors  above  zero  and  the  lag  5 
and  lag  6  autocorrelations  more  than  two  standard  errors  below  zero.  This  is  not  what 
we  expect  from  a  white  noise  process. 


Exhibit  3.16  Sample  Autocorrelation  of  Residuals  from  Straight  Line 
Model 

CD 


Lag 

>  acf (rstudent (modell) ) 

Finally,  we  return  to  the  annual  rainfall  in  Los  Angeles  shown  in  Exhibit  1 . 1  on 
page  2.  We  found  no  evidence  of  dependence  in  that  series,  but  we  now  look  for  evi¬ 
dence  against  normality.  Exhibit  3.17  displays  the  normal  quantile-quantile  plot  for  that 
series.  We  see  considerable  curvature  in  the  plot.  A  line  passing  through  the  first  and 
third  normal  quartiles  helps  point  out  the  departure  from  a  straight  line  in  the  plot. 
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Exhibit  3.17  Quantile-Quantile  Plot  of  Los  Angeles  Annual  Rainfall  Series 


Theoretical  Quantiles 

>  win . graph (width=2 . 5 , height=2 . 5 , pointsize=8 ) 

>  qqnorm ( larain) ;  qqline ( larain) 


3.7  Summary 


This  chapter  is  concerned  with  describing,  modeling,  and  estimating  deterministic 
trends  in  time  series.  The  simplest  deterministic  “trend”  is  a  constant-mean  function. 
Methods  of  estimating  a  constant  mean  were  given  but,  more  importantly,  assessment  of 
the  accuracy  of  the  estimates  under  various  conditions  was  considered.  Regression 
methods  were  then  pursued  to  estimate  trends  that  are  linear  or  quadratic  in  time.  Meth¬ 
ods  for  modeling  cyclical  or  seasonal  trends  came  next,  and  the  reliability  and  efficiency 
of  all  of  these  regression  methods  were  investigated.  The  final  section  began  our  study 
of  residual  analysis  to  investigate  the  quality  of  the  fitted  model.  This  section  also  intro¬ 
duced  the  important  sample  autocorrelation  function,  which  we  will  revisit  throughout 
the  remainder  of  the  book. 


Exercises 


3.1  Verify  Equation  (3.3.2)  on  page  30,  for  the  least  squares  estimates  of  p()  and  of  P| 
when  the  model  Yt  =  p0  +  PE  +  X,  is  considered. 

3.2  Suppose  Yt  =  p  +  et  -  e,  \.  Find  Var(Y)  .  Note  any  unusual  results.  In  particular, 
compare  your  answer  to  what  would  have  been  obtained  if  K,  =  li  +  et.  (Hint:  You 
may  avoid  Equation  (3.2.3)  on  page  28  by  first  doing  some  algebraic  simplifica¬ 
tion  on  J%=1(et-et„1).) 
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3.3  Suppose  Y,  =  p  +  et  +  et_\.  Find  Var(Y).  Compare  your  answer  to  what  would 
have  been  obtained  if  Y,  =  p  +  et.  Describe  the  effect  that  the  autocorrelation  in 
{Yt}  has  on  Var{Y). 

3.4  The  data  file  hours  contains  monthly  values  of  the  average  hours  worked  per 
week  in  the  U.S.  manufacturing  sector  for  July  1982  through  June  1987. 

(a)  Display  and  interpret  the  time  series  plot  for  these  data. 

(b)  Now  construct  a  time  series  plot  that  uses  separate  plotting  symbols  for  the 
various  months.  Does  your  interpretation  change  from  that  in  part  (a)? 

3.5  The  data  file  wages  contains  monthly  values  of  the  average  hourly  wages  (in  dol¬ 
lars)  for  workers  in  the  U.S.  apparel  and  textile  products  industry  for  July  1981 
through  June  1987. 

(a)  Display  and  interpret  the  time  series  plot  for  these  data. 

(b)  Use  least  squares  to  fit  a  linear  time  trend  to  this  time  series.  Interpret  the 
regression  output.  Save  the  standardized  residuals  from  the  fit  for  further  anal¬ 
ysis. 

(c)  Construct  and  interpret  the  time  series  plot  of  the  standardized  residuals  from 
part  (b). 

(d)  Use  least  squares  to  fit  a  quadratic  time  trend  to  the  wages  time  series.  Inter¬ 
pret  the  regression  output.  Save  the  standardized  residuals  from  the  fit  for  fur¬ 
ther  analysis. 

(e)  Construct  and  interpret  the  time  series  plot  of  the  standardized  residuals  from 
part  (d). 

3.6  The  data  file  beersales  contains  monthly  U.S.  beer  sales  (in  millions  of  barrels) 
for  the  period  January  1975  through  December  1990. 

(a)  Display  and  interpret  the  plot  the  time  series  plot  for  these  data. 

(b)  Now  construct  a  time  series  plot  that  uses  separate  plotting  symbols  for  the 
various  months.  Does  your  interpretation  change  from  that  in  part  (a)? 

(c)  Use  least  squares  to  fit  a  seasonal-means  trend  to  this  time  series.  Interpret  the 
regression  output.  Save  the  standardized  residuals  from  the  fit  for  further  anal¬ 
ysis. 

(d)  Construct  and  interpret  the  time  series  plot  of  the  standardized  residuals  from 
part  (c).  Be  sure  to  use  proper  plotting  symbols  to  check  on  seasonality  in  the 
standardized  residuals. 

(e)  Use  least  squares  to  fit  a  seasonal-means  plus  quadratic  time  trend  to  the  beer 
sales  time  series.  Interpret  the  regression  output.  Save  the  standardized  residu¬ 
als  from  the  fit  for  further  analysis. 

(f)  Construct  and  interpret  the  time  series  plot  of  the  standardized  residuals  from 
part  (e).  Again  use  proper  plotting  symbols  to  check  for  any  remaining  sea¬ 
sonality  in  the  residuals. 

3.7  The  data  file  Winnebago  contains  monthly  unit  sales  of  recreational  vehicles  from 
Winnebago,  Inc.,  from  November  1966  through  February  1972. 

(a)  Display  and  interpret  the  time  series  plot  for  these  data. 

(b)  Use  least  squares  to  fit  a  line  to  these  data.  Interpret  the  regression  output.  Plot 
the  standardized  residuals  from  the  fit  as  a  time  series.  Interpret  the  plot. 

(c)  Now  take  natural  logarithms  of  the  monthly  sales  figures  and  display  and 
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interpret  the  time  series  plot  of  the  transformed  values. 

(d)  Use  least  squares  to  fit  a  line  to  the  logged  data.  Display  and  interpret  the  time 
series  plot  of  the  standardized  residuals  from  this  fit. 

(e)  Now  use  least  squares  to  fit  a  seasonal-means  plus  linear  time  trend  to  the 
logged  sales  time  series  and  save  the  standardized  residuals  for  further  analy¬ 
sis.  Check  the  statistical  significance  of  each  of  the  regression  coefficients  in 
the  model. 

(f)  Display  the  time  series  plot  of  the  standardized  residuals  obtained  in  part  (e). 
Interpret  the  plot. 

3.8  The  data  file  retail  lists  total  U.K.  (United  Kingdom)  retail  sales  (in  billions  of 
pounds)  from  January  1986  through  March  2007.  The  data  are  not  “seasonally 
adjusted,”  and  year  2000  =  100  is  the  base  year. 

(a)  Display  and  interpret  the  time  series  plot  for  these  data.  Be  sure  to  use  plotting 
symbols  that  permit  you  to  look  for  seasonality. 

(b)  Use  least  squares  to  fit  a  seasonal-means  plus  linear  time  trend  to  this  time 
series.  Interpret  the  regression  output  and  save  the  standardized  residuals  from 
the  fit  for  further  analysis. 

(c)  Construct  and  interpret  the  time  series  plot  of  the  standardized  residuals  from 
part  (b).  Be  sure  to  use  proper  plotting  symbols  to  check  on  seasonality. 

3.9  The  data  file  prescrip  gives  monthly  U.S.  prescription  costs  for  the  months 
August  1986  to  March  1992.  These  data  are  from  the  State  of  New  Jersey’s  Pre¬ 
scription  Drug  Program  and  are  the  cost  per  prescription  claim. 

(a)  Display  and  interpret  the  time  series  plot  for  these  data.  Use  plotting  symbols 
that  permit  you  to  look  for  seasonality. 

(b)  Calculate  and  plot  the  sequence  of  month-to-month  percentage  changes  in  the 
prescription  costs.  Again,  use  plotting  symbols  that  permit  you  to  look  for  sea¬ 
sonality. 

(c)  Use  least  squares  to  fit  a  cosine  trend  with  fundamental  frequency  1/12  to  the 
percentage  change  series.  Interpret  the  regression  output.  Save  the  standard¬ 
ized  residuals. 

(d)  Plot  the  sequence  of  standardized  residuals  to  investigate  the  adequacy  of  the 
cosine  trend  model.  Interpret  the  plot. 

3.10  (Continuation  of  Exercise  3.4)  Consider  the  hours  time  series  again. 

(a)  Use  least  squares  to  fit  a  quadratic  trend  to  these  data.  Interpret  the  regression 
output  and  save  the  standardized  residuals  for  further  analysis. 

(b)  Display  a  sequence  plot  of  the  standardized  residuals  and  interpret.  Use 
monthly  plotting  symbols  so  that  possible  seasonality  may  be  readily  identi¬ 
fied. 

(c)  Perform  the  Runs  test  of  the  standardized  residuals  and  interpret  the  results. 

(d)  Calculate  and  interpret  the  sample  autocorrelations  for  the  standardized  resid¬ 
uals. 

(e)  Investigate  the  normality  of  the  standardized  residuals  (error  terms).  Consider 
histograms  and  normal  probability  plots.  Interpret  the  plots. 
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3.11  (Continuation  of  Exercise  3.5)  Return  to  the  wages  series. 

(a)  Consider  the  residuals  from  a  least  squares  fit  of  a  quadratic  time  trend. 

(b)  Perform  a  runs  test  on  the  standardized  residuals  and  interpret  the  results. 

(c)  Calculate  and  interpret  the  sample  autocorrelations  for  the  standardized  resid¬ 
uals. 

(d)  Investigate  the  normality  of  the  standardized  residuals  (error  terms).  Consider 
histograms  and  normal  probability  plots.  Interpret  the  plots. 

3.12  (Continuation  of  Exercise  3.6)  Consider  the  time  series  in  the  data  file  beersales. 

(a)  Obtain  the  residuals  from  the  least  squares  fit  of  the  seasonal-means  plus  qua¬ 
dratic  time  trend  model. 

(b)  Perform  a  runs  test  on  the  standardized  residuals  and  interpret  the  results. 

(c)  Calculate  and  interpret  the  sample  autocorrelations  for  the  standardized  resid¬ 
uals. 

(d)  Investigate  the  normality  of  the  standardized  residuals  (error  terms).  Consider 
histograms  and  normal  probability  plots.  Interpret  the  plots. 

3.13  (Continuation  of  Exercise  3.7)  Return  to  the  Winnebago  time  series. 

(a)  Calculate  the  least  squares  residuals  from  a  seasonal-means  plus  linear  time 
trend  model  on  the  logarithms  of  the  sales  time  series. 

(b)  Perform  a  runs  test  on  the  standardized  residuals  and  interpret  the  results. 

(c)  Calculate  and  interpret  the  sample  autocorrelations  for  the  standardized  resid¬ 
uals. 

(d)  Investigate  the  normality  of  the  standardized  residuals  (error  terms).  Consider 
histograms  and  normal  probability  plots.  Interpret  the  plots. 

3.14  (Continuation  of  Exercise  3.8)  The  data  file  retail  contains  U.K.  monthly  retail 

sales  figures. 

(a)  Obtain  the  least  squares  residuals  from  a  seasonal-means  plus  linear  time 
trend  model. 

(b)  Perform  a  runs  test  on  the  standardized  residuals  and  interpret  the  results. 

(c)  Calculate  and  interpret  the  sample  autocorrelations  for  the  standardized  resid¬ 
uals. 

(d)  Investigate  the  normality  of  the  standardized  residuals  (error  terms).  Consider 
histograms  and  normal  probability  plots.  Interpret  the  plots. 

3.15  (Continuation  of  Exercise  3.9)  Consider  again  the  prescrip  time  series. 

(a)  Save  the  standardized  residuals  from  a  least  squares  fit  of  a  cosine  trend  with 
fundamental  frequency  1/12  to  the  percentage  change  time  series. 

(b)  Perform  a  runs  test  on  the  standardized  residuals  and  interpret  the  results. 

(c)  Calculate  and  interpret  the  sample  autocorrelations  for  the  standardized  resid¬ 
uals. 

(d)  Investigate  the  normality  of  the  standardized  residuals  (error  terms).  Consider 
histograms  and  normal  probability  plots.  Interpret  the  plots. 
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3.16 


Suppose  that  a  stationary  time  series,  {Yt},  has  an  autocorrelation  function  of  the 
form  p^-  =  (|/  for  k  >  0,  where  <j>  is  a  constant  in  the  range  (— 1,+1). 


(a)  Show  that  Var(Y)  =  —  | 


2(H  l  -  4>»)~ 

n  (i-4>)2- ' 


(Hint:  Use  Equation  (3.2.3)  on  page  28,  the  finite  geometric  sum 


1  -  d)»  +  1 


- - ,  and  the  related  sum  V  kfok  1  =  —  V  f  .) 

-a  .A  v  di |>  A- 


(b)  If  n  is  large,  argue  that  Var(  Y)  «  —  |  +  ()'  . 

n  LI  - 


(c)  Plot  ( 1  +  4>)/(  1  —  (jj)  fort))  over  the  range  -1  to  +1.  Interpret  the  plot  in  terms 
of  the  precision  in  estimating  the  process  mean. 

3.17  Verify  Equation  (3.2.6)  on  page  29.  (Hint:  You  will  need  the  fact  that 


00  1 

Z*‘  =  r" 

k  =  0  1  -  ■ 


for  -1  <(J><+1.) 


3.18  Verify  Equation  (3.2.7)  on  page  30.  (Hint:  You  will  need  the  two  sums 


£  t  =  »(»+  1)  and  £  t2  =  n(n+  1)(2 n+  1) ^ 

t= 1  2  t=  1  6 


Chapter  4 

Models  for  Stationary  Time  Series 


This  chapter  discusses  the  basic  concepts  of  a  broad  class  of  parametric  time  series 
models — the  autoregressive  moving  average  (ARMA)  models.  These  models  have 
assumed  great  importance  in  modeling  real-world  processes. 

4.1  General  Linear  Processes 


We  will  always  let  [Yt}  denote  the  observed  time  series.  From  here  on  we  will  also  let 
{et}  represent  an  unobserved  white  noise  series,  that  is,  a  sequence  of  identically  distrib¬ 
uted,  zero-mean,  independent  random  variables.  For  much  of  our  work,  the  assumption 
of  independence  could  be  replaced  by  the  weaker  assumption  that  the  {et}  are  uncorre¬ 
lated  random  variables,  but  we  will  not  pursue  that  slight  generality. 

A  general  linear  process ,  {  V,  }.  is  one  that  can  be  represented  as  a  weighted  linear 
combination  of  present  and  past  white  noise  terms  as 

Yt  =  et  +  'Vlet-l  +V2ef-2+  "•  (4.1.1) 

If  the  right-hand  side  of  this  expression  is  truly  an  infinite  series,  then  certain  conditions 
must  be  placed  on  the  \|/-weights  for  the  right-hand  side  to  be  meaningful  mathemati¬ 
cally.  For  our  purposes,  it  suffices  to  assume  that 

00  _ 

^  \|/r  <  00  (4.1.2) 

1=1 

We  should  also  note  that  since  {et}  is  unobservable,  there  is  no  loss  in  the  generality  of 
Equation  (4.1.2)  if  we  assume  that  the  coefficient  on  et  is  1;  effectively,  \|/0  =  1. 

An  important  nontrivial  example  to  which  we  will  return  often  is  the  case  where  the 
\|/’s  form  an  exponentially  decaying  sequence 

\\ij  =  <\>J 

where  <j>  is  a  number  strictly  between  -1  and  +1.  Then 

Yt  =  et  +  ^et-l  +f-et_2+  ••• 

For  this  example, 

E(Yf )  =  E(et  +  §et  j  +  §2et _9  +  •••)  =  0 
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so  that  { Yt }  has  a  constant  mean  of  zero.  Also, 

Var(Yt )  =  Var(et  +  §et  _j  +  §2et  _2  +  •••) 

=  Var(ef)  +  §2Var(et_  j)  +  §4Var(ef _2)  + 

=  c?2(l  +  (j,2  +  r),4  +  ...) 

°e 

=  - -  (by  summing  a  geometric  series) 

l  —  4>- 

Furthermore, 

Cov(Ff,Ff  l)  =  Cov(ef  +  ())ef_  j  +  c|)2e?_9 +•••,  er_  j  9  +  (|)2er_  3  +  •••) 

=  Cov(fyet  j,  er  l)  +  Cov((|)2ef _ 2,  (l>ff_2)  + 

=  <|>a2  +  (|)3a2  +  (|>5a2+  ••• 

=  c|)ct2(  l  +  4>2  +  4>4  +  •••) 

(j )02 

=  - —  (again  summing  a  geometric  series) 

1  -  (j>- 


Thus 


Corr{Yt,  F,_,)  = 


In  a  similar  manner,  we  can  find  Co  v(  Ff,  F.  . ) 
and  thus 

Corr(Yf,Yt_k)  =  §k 


l  —  4>2 


(4.1.3) 


It  is  important  to  note  that  the  process  defined  in  this  way  is  stationary — the  autoco¬ 
variance  structure  depends  only  on  time  lag  and  not  on  absolute  time.  For  a  general  lin- 
earprocess,  Yf  =  ef  +  j  +  \|/2er_  9 +  •••,  calculations  similar  to  those  done  above 

yield  the  following  results: 


00 

E(Yt)  =  0  ~{k  =  Cov(Yp  Y,_k)  =  ct2  £  ViVi  +  k  k^°  (4AA > 

i  =  0 


with  \|/0  =  1.  A  process  with  a  nonzero  mean  p  may  be  obtained  by  adding  p  to  the 
right-hand  side  of  Equation  (4.1.1).  Since  the  mean  does  not  affect  the  covariance  prop¬ 
erties  of  a  process,  we  assume  a  zero  mean  until  we  begin  fitting  models  to  data. 


4.2  Moving  Average  Processes 
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4.2  Moving  Average  Processes 


In  the  case  where  only  a  finite  number  of  the  \|/-weights  are  nonzero,  we  have  what  is 
called  a  moving  average  process.  In  this  case,  we  change  notation  *  somewhat  and  write 

Yt  =  et-Qlet-l~Q2et-2 - V>-«  (4'2'1} 

We  call  such  a  series  a  moving  average  of  order  q  and  abbreviate  the  name  to  MA(ij). 
The  terminology  moving  average  arises  from  the  fact  that  Yt  is  obtained  by  applying  the 
weights  1,  — 8j,  -02,...,  -Gq  to  the  variables  et,  et  et  et  _ q  and  then  moving  the 

weights  and  applying  them  to  et  +  j,  et,  et  _  j, ... ,  et  _q  +  \  to  obtain  Yt+i  and  so  on.  Mov¬ 
ing  average  models  were  first  considered  by  Slutsky  (1927)  and  Wold  (1938). 


The  First-Order  Moving  Average  Process 

We  consider  in  detail  the  simple  but  nevertheless  important  moving  average  process  of 
order  1,  that  is,  the  MA(1)  series.  Rather  than  specialize  the  formulas  in  Equation 
(4.1.4),  it  is  instructive  to  rederive  the  results.  The  model  is  Y.  =  ef-0e;  |  .  Since 
only  one  0  is  involved,  we  drop  the  redundant  subscript  1.  Clearly  E(Yf)=  0 
and Var(Yf)  =  cr2(  1  +  92).  Now 

Cov(YpYt_1 )  =  Cov(et-det_l,et^l-det_2) 

=  Cov(-9e?_  j,  et  j)  =  -0<7g 

and 

Cov(Yt,Yt_2 )  =  Cov(et-Get_1,et_2-Get_3) 

=  0 


since  there  are  no  e’s  with  subscripts  in  common  between  Yt  and  Yt  _2.  Similarly, 
Cov(Yp  Y ( _  ,)  =  0  whenever  k>  2;  that  is,  the  process  has  no  correlation  beyond  lag 
1 .  This  fact  will  be  important  later  when  we  need  to  choose  suitable  models  for  real 
data. 

In  summary,  for  an  MA(1)  model  Yf  =  e,-Get _j, 


E(Yf)  =  0 

Y0  =  Var(Yt)  =  0^(1  +  02) 

Yl  =  -0a2 

Pi  =  (— 0)/ ( 1  +  02) 

Y a  =  Pk  =  0  for  k  >  2 


(4.2.2) 


+  The  reason  for  this  change  will  be  evident  later  on.  Some  statistical  software,  for  example 
R,  uses  plus  signs  before  the  thetas.  Check  with  yours  to  see  which  convention  it  uses. 
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Some  numerical  values  for  pj  versus  0  in  Equation  (4.2.2)  help  illustrate  the  possi¬ 
bilities.  Note  that  the  pj  values  for  negative  0  can  be  obtained  by  simply  negating  the 
value  given  for  the  corresponding  positive  0- value. 


0 

Pi  =  -9/(1 +  92) 

0 

Pl  =  -0/(1  +  02) 

0.1 

-0.099 

0.6 

-0.441 

0.2 

-0.192 

0.7 

-0.470 

0.3 

-0.275 

0.8 

-0.488 

0.4 

-0.345 

0.9 

-0.497 

0.5 

-0.400 

1.0 

-0.500 

A  calculus  argument  shows  that  the  largest  value  that  pj  can  attain  is  pj  =  Vi  when 
0  =  -1  and  the  smallest  value  is  pj  =  -Vi,  which  occurs  when  0  =  +1  (see  Exercise  4.3). 
Exhibit  4.1  displays  a  graph  of  the  lag  1  autocorrelation  values  for  0  ranging  from  -1  to 
+1. 


Exhibit  4.1  Lag  1  Autocorrelation  of  an  MA(1)  Process  for  Different  0 


0 

Exercise  4.4  asks  you  to  show  that  when  any  nonzero  value  of  0  is  replaced  by  1/0, 
the  same  value  for  pj  is  obtained.  For  example,  pj  is  the  same  for  0  =  Vi  as  for  0  =  1  KVi) 
=  2.  If  we  knew  that  an  MA(1)  process  had  pj  =  0.4,  we  still  could  not  tell  the  precise 
value  of  0.  We  will  return  to  this  troublesome  point  when  we  discuss  invertibility  in 
Section  4.5  on  page  79. 

Exhibit  4.2  shows  a  time  plot  of  a  simulated  MA(1)  series  with  0  =  -0.9  and  nor¬ 
mally  distributed  white  noise.  Recall  from  Exhibit  4.1  that  pj  =  0.4972  for  this  model; 
thus  there  is  moderately  strong  positive  correlation  at  lag  1 .  This  correlation  is  evident 
in  the  plot  of  the  series  since  consecutive  observations  tend  to  be  closely  related.  If  an 
observation  is  above  the  mean  level  of  the  series,  then  the  next  observation  also  tends  to 
be  above  the  mean.  The  plot  is  relatively  smooth  over  time,  with  only  occasional  large 
fluctuations. 


4.2  Moving  Average  Processes 
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Exhibit  4.2  Time  Plot  of  an  MA(1)  Process  with  0  =  -0.9 


Time 

>  win . graph (width=4 . 875 , height =3 , point size =8 ) 

>  data (mal . 2 . s) ;  plot (mal . 2 . s , ylab=expression ( Y [t] ) , type= 'o') 


The  lag  1  autocorrelation  is  even  more  apparent  in  Exhibit  4.3,  which  plots  Yt  ver¬ 
sus  Yt_ j.  Note  the  moderately  strong  upward  trend  in  this  plot. 


Exhibit  4.3  Plot  of  Yt  versus  Yt_1  for  MA(1)  Series  in  Exhibit  4.2 


Y,-i 


>  win . graph (width=3 , height =3 , point size =8 ) 

>  plot (y=mal . 2 . s , x=zlag (mal . 2 . s) , ylab=expression (Y [t] ) , 

xlab=expression (Y [t-1] ) , type= ' p ' ) 
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The  plot  of  Yt  versus  Yt  _ 2  in  Exhibit  4.4  gives  a  strong  visualization  of  the  zero 
autocorrelation  at  lag  2  for  this  model. 


Exhibit  4.4  Plot  of  Yt  versus  Yt_2  for  MA(1)  Series  in  Exhibit  4.2 


Yt-2 

>  plot (y=mal . 2 . s , x=zlag (mal . 2 . s , 2) , ylab=expression (Y [t] ) , 
xlab=expression (Y [t-2] ) , type= ' p ' ) 


A  somewhat  different  series  is  shown  in  Exhibit  4.5.  This  is  a  simulated  MA(1) 
series  with  0  =  +0.9.  Recall  from  Exhibit  4.1  that  p|  =  -0.497  for  this  model;  thus  there 
is  moderately  strong  negative  correlation  at  lag  1 .  This  correlation  can  be  seen  in  the 
plot  of  the  series  since  consecutive  observations  tend  to  be  on  opposite  sides  of  the  zero 
mean.  If  an  observation  is  above  the  mean  level  of  the  series,  then  the  next  observation 
tends  to  be  below  the  mean.  The  plot  is  quite  jagged  over  time — especially  when  com¬ 
pared  with  the  plot  in  Exhibit  4.2. 


4.2  Moving  Average  Processes 
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Exhibit  4.5  Time  Plot  of  an  MA(1)  Process  with  0  =  +0.9 


Time 

>  win . graph (width=4 . 875 , height =3 , point size =8 ) 

>  data (mal . 1 . s ) 

>  plot (mal . 1 . s , ylab=expression ( Y [t] ) , type= 'o') 


The  negative  lag  1  autocorrelation  is  even  more  apparent  in  the  lag  plot  of  Exhibit 
4.6. 


Exhibit  4.6  Plot  of  Yt  versus  Yt_1  for  MA(1)  Series  in  Exhibit  4.5 


Y,-i 

>  win . graph (width=3 ,  height=3 , pointsize=8 ) 

>  plot (y=mal . 1 . s , x=zlag (mal . 1 . s) , ylab=expression (Y [t] ) , 

xlab=expression (Y [t-1] ) , type= ' p ' ) 
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The  plot  of  Yt  versus  Yt  _2  in  Exhibit  4.7  displays  the  zero  autocorrelation  at  lag  2 
for  this  model. 


Exhibit  4.7  Plot  of  Yt  versus  Yt_2  for  MA(1)  Series  in  Exhibit  4.5 


Yt-2 


>  plot (y=mal . 1 . s , x=zlag (mal . 1 . s , 2 ) , ylab=expression ( Y [t] ) , 
xlab=expression (Y [ t  —  2 ] ) , type= ' p ' ) 

MA(1)  processes  have  no  autocorrelation  beyond  lag  1,  but  by  increasing  the  order 
of  the  process,  we  can  obtain  higher-order  correlations. 

The  Second-Order  Moving  Average  Process 

Consider  the  moving  average  process  of  order  2: 

Yt  =  et  ~  ® let-  1  —  02ef-2 

Here 

Y0  =  Var(Yr)  =  Var(et-Qlet_  1-02e,_2)  =  (1  +  Bj1  +  9|)o;r 

Yt  =  Cov(YrYf_j)  =  Cov(et-Q1et_l-Q2et_2,et_1-Qlet_2-Q2et_3) 

=  Cov(- 9jef _  j,  ef  j)  +  Cov(-Q1et  2>  ~®2et -2^ 

=  [-0j  +  (-e1)(-e2)]o2 

=  (—  9  j  +  0  j  02)ct7 


and 
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y2  =  Cov{Yf,  Yt_2)  =  Cov(et-d1et_1-G2et_2,et_2-d1et_3-e2el_4) 
=  Cov(-02ef_2,  et_2) 

= 


Thus,  for  an  MA(2)  process. 


For  the  specific  case  Y. 

Pi  = 


—  6  ]  +  0  j  9t 
Pl  ”  1+02  +  02 

-Q2 

2  1+02  +  02 
pk  =  0  for  k  =  3,  4,... 

=  et  -  et  j  +  0.6e(  _  2 ,  we  have 
-  1  +  ( 1  )(-0-6)  =  z!6  =  _0  678 

1  +( l)2  +  (-0.6)2  2.36 


(4.2.3) 


and 


P2  = 


0.6 

2.36 


=  0.254 


A  time  plot  of  a  simulation  of  this  MA(2)  process  is  shown  in  Exhibit  4.8.  The 
series  tends  to  move  back  and  forth  across  the  mean  in  one  time  unit.  This  reflects  the 
fairly  strong  negative  autocorrelation  at  lag  1 . 


Exhibit  4.8  Time  Plot  of  an  MA(2)  Process  with  0i  =  1  and  02  =  -0.6 


Time 

>  win . graph (width=4 . 875 ,  height=3 , pointsize=8 ) 

>  data(ma2.s);  plot (ma2 . s , ylab=expression ( Y [t] ) , type= 'o') 
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The  plot  in  Exhibit  4.9  reflects  that  negative  autocorrelation  quite  dramatically. 


Exhibit  4.9  Plot  of  Yt  versus  Yt_  ^  for  MA(2)  Series  in  Exhibit  4.8 


Y,-i 

>  win . graph (width=3 , height =3 , pointsize=8 ) 

>  plot (y=ma2 . s,x=zlag (ma2 . s) , ylab=expression (Y [t] ) , 

xlab=expression (Y [t-1] ) , type= ' p ' ) 


The  weak  positive  autocorrelation  at  lag  2  is  displayed  in  Exhibit  4.10. 


Exhibit  4.10  Plot  of  Yt  versus  Yt_2  for  MA(2)  Series  in  Exhibit  4.8 


Y,  2 

>  plot (y=ma2 . s , x=zlag (ma2 . s , 2 ) , ylab= express ion ( Y [t] ) , 
xlab=expression (Y [t-2] ) , type= ' p ' ) 
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Finally,  the  lack  of  autocorrelation  at  lag  3  is  apparent  from  the  scatterplot  in 
Exhibit  4. 11. 


Exhibit  4.1 1  Plot  of  Yt  versus  Yt  _  3  for  MA(2)  Series  in  Exhibit  4.8 


Yt-3 

>  plot (y=ma2 . s , x=zlag (ma2 . s, 3 ) , ylab=expression (Y [t] ) , 
xlab=expression (Y [t-3] ) , type= 1 p ' ) 


The  General  MA(qr)  Process 


For  the  general  MA(g)  process  Y  =  ef-9iet  l~9-1ef  _1 
lations  show  that 


Yq  -  (1  +  9f  +  69 


- 9  e.  ,  similar  calcu- 

q  i  q 

(4.2.4) 


and 


P  k  = 


f-9fc  +  9i9t+t  +  Q2eAr  +  2  +  •••  +  6f/-A/ 

1  +  0J  +  0|  +  •••  +  9“ 
for  k>  q 


for  k  =  1,  2,...,  q 


(4.2.5) 


where  the  numerator  of  is  just  -0  .  The  autocorrelation  function  “cuts  off’  after  lag 
q\  that  is,  it  is  zero.  Its  shape  can  be  almost  anything  for  the  earlier  lags.  Another  type  of 
process,  the  autoregressive  process,  provides  models  for  alternative  autocorrelation  pat¬ 
terns. 
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4.3  Autoregressive  Processes 


Autoregressive  processes  are  as  their  name  suggests — regressions  on  themselves.  Spe¬ 
cifically,  a  pth-order  autoregressive  process  { Yt }  satisfies  the  equation 

Y,  =  W,-l+Wt-2+-+*pYt-p  +  e,  ^.3.1) 

The  current  value  of  the  series  Yt  is  a  linear  combination  of  the  p  most  recent  past  values 
of  itself  plus  an  “innovation”  term  et  that  incorporates  everything  new  in  the  series  at 
time  t  that  is  not  explained  by  the  past  values.  Thus,  for  every  t,  we  assume  that  e,  is 
independent  of  Yt-  1,  Yt- 2,  Yt- 3,...  .  Yule  (1926)  carried  out  the  original  work  on 
autoregressive  processes.^ 

The  First-Order  Autoregressive  Process 

Again,  it  is  instructive  to  consider  the  first-order  model,  abbreviated  AR(1),  in  detail. 
Assume  the  series  is  stationary  and  satisfies 

Yt  =  §Yt_1  +ef  (4.3.2) 

where  we  have  dropped  the  subscript  1  from  the  coefficient  4)  for  simplicity.  As  usual,  in 
these  initial  chapters,  we  assume  that  the  process  mean  has  been  subtracted  out  so  that 
the  series  mean  is  zero.  The  conditions  for  stationarity  will  be  considered  later. 

We  first  take  variances  of  both  sides  of  Equation  (4.3.2)  and  obtain 

YO  =  <t>2Y0  + 


Solving  for  y0  yields 


(4.3.3) 


Notice  the  immediate  implication  that  4>2  <  1  or  that  |c])|  <  1.  Now  take  Equation 
(4.3.2),  multiply  both  sides  by  Yt _k  (k  =  1, 2,...),  and  take  expected  values 


E(Yt_kYt)  =  mYt_kYt_l)  +  E{etYt_k) 


or 

y*  =  4>Y*-i  +  £(«trf-jt) 


Since  the  series  is  assumed  to  be  stationary  with  zero  mean,  and  since  e,  is  indepen¬ 
dent  of  Yt  _  k,  we  obtain 


E(etYt_k)  =  E(et)E(Yf_k)  =  0 


and  so 


Recall  that  we  are  assuming  that  Yt  has  zero  mean.  We  can  always  introduce  a  nonzero 
mean  by  replacing  Yt  by  Yt  —  |l  throughout  our  equations. 
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7a  =  4>Ya-i  for  k=l,2, 3,...  (4.3.4) 

Setting  k  =  1 ,  we  get  yj  =  4*y0  =  4>ct2/(  1  -  4>2) .  With  k  =  2,  we  obtainy2  = 
4>2G“/(  1  -  ())2) .  Now  it  is  easy  to  see  that  in  general 


(4.3.5) 


and  thus 

p  =  I*  =  ^  for  k  =  1,2, 3,...  (4.3.6) 

7o 

Since  |4>|  <  1,  the  magnitude  of  the  autocorrelation  function  decreases  exponentially 
as  the  number  of  lags,  k,  increases.  If  0  <  <|)  <  1,  all  correlations  are  positive;  if 
—  1  <  4>  <  0  ,  the  lag  1  autocorrelation  is  negative  (pj  =  ij))  and  the  signs  of  successive 
autocorrelations  alternate  from  positive  to  negative,  with  their  magnitudes  decreasing 
exponentially.  Portions  of  the  graphs  of  several  autocorrelation  functions  are  displayed 
in  Exhibit  4. 12. 


Exhibit  4.12  Autocorrelation  Functions  for  Several  AR(1)  Models 


2  4  6  8  10  12  2  4  6  8  10  12 

Lag  Lag 


Notice  that  for  4>  near  ±1,  the  exponential  decay  is  quite  slow  (for  example,  (0.9)6  = 
0.53),  but  for  smaller  c|),  the  decay  is  quite  rapid  (for  example,  (0.4)6  =  0.00410).  With  4) 
near  ±1  ,  the  strong  correlation  will  extend  over  many  lags  and  produce  a  relatively 
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smooth  series  if  (|>  is  positive  and  a  very  jagged  series  if  (j)  is  negative. 

Exhibit  4.13  displays  the  time  plot  of  a  simulated  AR(1)  process  with  <j)  =  0.9. 
Notice  how  infrequently  the  series  crosses  its  theoretical  mean  of  zero.  There  is  a  lot  of 
inertia  in  the  series — it  hangs  together,  remaining  on  the  same  side  of  the  mean  for 
extended  periods.  An  observer  might  claim  that  the  series  has  several  trends.  We  know 
that  in  fact  the  theoretical  mean  is  zero  for  all  time  points.  The  illusion  of  trends  is  due 
to  the  strong  autocorrelation  of  neighboring  values  of  the  series. 
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Exhibit  4.14  Plot  of  Yt  versus  Yt_  i  for  AR(1)  Series  of  Exhibit  4.13 


Vt-i 

>  win . graph (width=3 ,  height=3 , pointsize=8 ) 

>  plot (y=arl . s , x=zlag (arl . s) , ylab=expression (Y [t] ) , 

xlab=expression (Y [t-1] ) , type= 1 p ' ) 


This  AR(1)  model  also  has  strong  positive  autocorrelation  at  lag  2,  namely  p2  = 
(0.9)2  =  0.81.  Exhibit  4.15  shows  this  quite  well. 


Exhibit  4.15  Plot  of  Yt  versus  Yt_2  for  AR(1)  Series  of  Exhibit  4.13 


Yt-2 

>  plot (y=arl . s , x=zlag (arl . s, 2 ) , ylab=expression (Y [t] ) , 
xlab=expression (Y [ t - 2 ] ) , type= 1 p ' ) 
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o 

Finally,  at  lag  3,  the  autocorrelation  is  still  quite  high:  p3  =  (0.9)  =  0.729.  Exhibit 
4.16  confirms  this  for  this  particular  series. 

Exhibit  4.16  Plot  of  Yt  versus  Yt_3  for  AR(1)  Series  of  Exhibit  4.13 


„  (M 
> 

O 


(M 
I 

-2  0  2  4 

Yt-3 

>  plot (y=arl . s , x=zlag (arl . s , 3 ) , ylab=expression (Y [t] ) , 
xlab=expression ( Y [ t  —  3 ] ) , type= ' p ' ) 

The  General  Linear  Process  Version  of  the  AR(1)  Model 

The  recursive  definition  of  the  AR(1)  process  given  in  Equation  (4.3.2)  is  extremely 
useful  for  interpretating  the  model.  For  other  purposes,  it  is  convenient  to  express  the 
AR(  1)  model  as  a  general  linear  process  as  in  Equation  (4.1.1).  The  recursive  definition 
is  valid  for  all  t.  If  we  use  this  equation  with  t  replaced  by  f-  1,  we  get  Y,  j  = 
4 )Y{1  +  ef_\.  Substituting  this  into  the  original  expression  gives 

Yt  =  MYt-2  +  e,-\  )  +  et 
=  et  +  ifet_1  +  if2Yt_2 

If  we  repeat  this  substitution  into  the  past,  say  k  -  1  times,  we  get 

Yt  =  et  +  <bet_l  +  <b2et_2  +  ~-  +<bk~let_k+l+<bkYt_k  (4.3.7) 

Assuming  |c|)|  <  1  and  letting  k  increase  without  bound,  it  seems  reasonable  (this  is 
almost  a  rigorous  proof)  that  we  should  obtain  the  infinite  series  representation 


Yt  =  et  +  §et_{+§2et_2  +  ^et_2> 


+  ... 


(4.3.8) 
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This  is  in  the  form  of  the  general  linear  process  of  Equation  (4.1.1)  with  \ y.  =  <|>A 
which  we  already  investigated  in  Section  4.1  on  page  55.  Note  that  this  representation 
reemphasizes  the  need  for  the  restriction  |(j>|  <  1. 

Stationarity  of  an  AR(1)  Process 

It  can  be  shown  that,  subject  to  the  restriction  that  et  be  independent  of  Yt_  j,  Yt  _2, 
Yt_ 3,...  and  that  CTI  >  0  ,  the  solution  of  the  AR(1)  defining  recursion  Y,  =  §Yf_\  +  ef 
will  be  stationary  if  and  only  if  |4>|  <  1  .  The  requirement  |(J)|  <  1  is  usually  called  the 
stationarity  condition  for  the  AR(1)  process  (See  Box,  Jenkins,  and  Reinsel,  1994, 
p.  54;  Nelson,  1973,  p.  39;  and  Wei,  2005,  p.  32)  even  though  more  than  stationarity  is 
involved.  See  especially  Exercises  4.16,  4.18,  and  4.25. 

At  this  point,  we  should  note  that  the  autocorrelation  function  for  the  AR(1)  process 
has  been  derived  in  two  different  ways.  The  first  method  used  the  general  linear  process 
representation  leading  up  to  Equation  (4.1.3).  The  second  method  used  the  defining 
recursion  Y.  =  4>Trl  +  et  and  the  development  of  Equations  (4.3.4),  (4.3.5),  and 
(4.3.6).  A  third  derivation  is  obtained  by  multiplying  both  sides  of  Equation  (4.3.7)  by 
Yt  _ic,  taking  expected  values  of  both  sides,  and  using  the  fact  that  et ,  et_  j, 

i)  are  independent  of  Yt_ji.  The  second  method  should  be  especially  noted  since 
it  will  generalize  nicely  to  higher-order  processes. 

The  Second-Order  Autoregressive  Process 

Now  consider  the  series  satisfying 

Yt  =  *lYt-i+*2Yt-2  +  *t  (4.3.9) 

where,  as  usual,  we  assume  that  et  is  independent  of  Yt_  j,  Y, _  2,  Yt_  3,...  .  To  discuss 
stationarity,  we  introduce  the  AR  characteristic  polynomial 

4>(x)  =  1  -  -  fy2x2 

and  the  corresponding  AR  characteristic  equation 

1  -  (|)  j  a:  -  (j)2x2  =  0 

We  recall  that  a  quadratic  equation  always  has  two  roots  (possibly  complex). 

Stationarity  of  the  AR(2)  Process 

It  may  be  shown  that,  subject  to  the  condition  that  et  is  independent  of  Yt_  j,  Yt_  2, 
Yt_  3,...,  a  stationary  solution  to  Equation  (4.3.9)  exists  if  and  only  if  the  roots  of  the  AR 
characteristic  equation  exceed  1  in  absolute  value  (modulus).  We  sometimes  say  that  the 
roots  should  lie  outside  the  unit  circle  in  the  complex  plane.  This  statement  will  general¬ 
ize  to  the  pth-order  case  without  change. 1 


'  It  also  applies  in  the  first-order  case,  where  the  AR  characteristic  equation  is  just  1  —  cj).v  =  0 
with  root  l/(j),  which  exceeds  1  in  absolute  value  if  and  only  if  |(|)|  <  1. 
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In  the  second-order  case,  the  roots  of  the  quadratic  characteristic  equation  are  easily 
found  to  be 


^1  ±  a/^1  +  4(t>2 

-2<t>2 


(4.3.10) 


For  stationarity,  we  require  that  these  roots  exceed  I  in  absolute  value.  In  Appendix 
B,  page  84,  we  show  that  this  will  be  true  if  and  only  if  three  conditions  are  satisfied: 

4> i  +  (j)2  <  1,  <j>2  —  <1> i  <  1  ^  and  |4»-,|  •<  1  (4.3.11) 

As  with  the  AR(1)  model,  we  call  these  the  stationarity  conditions  for  the  AR(2) 
model.  This  stationarity  region  is  displayed  in  Exhibit  4.17. 


Exhibit  4.17  Stationarity  Parameter  Region  for  AR(2)  Process 


<t>i 

The  Autocorrelation  Function  for  the  AR(2)  Process 

To  derive  the  autocorrelation  function  for  the  AR(2)  case,  we  take  the  defining  recursive 
relationship  of  Equation  (4.3.9),  multiply  both  sides  by  Yt  _  k,  and  take  expectations. 
Assuming  stationarity,  zero  means,  and  that  e,  is  independent  of  Yt  _  k,  we  get 

lk  =  <hYjfc_i  +  ^2^k-2  for  k  =  1,2,  3,...  (4.3.12) 

or,  dividing  through  by  jq, 

Pk  =  ‘M*-  1  +(i)2P/c-2  for  k  =  1,  2,  3, ...  (4.3.13) 

Equations  (4.3.12)  and/or  (4.3.13)  are  usually  called  the  Yule-Walker  equations,  espe¬ 
cially  the  set  of  two  equations  obtained  for  k  =  1  and  2.  Setting  k  =  I  and  using  p0  =  1 
and  p_j  =  pj,  we  get  pj  =  <t>j  +  4>2P l  and  so 
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<t>. 

Pi  =  (4-3.14) 

1-4)2 

Using  the  now  known  values  for  P]  (and  p0),  Equation  (4.3. 13)  can  be  used  with  k  =  2  to 
obtain 

P2  =  4*  1  p  1  +  4)2p0 

4>t(  1  -  4u)  +  4>i  (4.3.15) 

1-4)2 

Successive  values  of  p/,  may  be  easily  calculated  numerically  from  the  recursive  rela¬ 
tionship  of  Equation  (4.3.13). 

Although  Equation  (4.3.13)  is  very  efficient  for  calculating  autocorrelation  values 
numerically  from  given  values  of  4q  and  (j>2,  for  other  purposes  it  is  desirable  to  have  a 
more  explicit  formula  for  p/..  The  form  of  the  explicit  solution  depends  critically  on  the 
roots  of  the  characteristic  equation  1  -  §-,x2  =  0  .  Denoting  the  reciprocals  of 

these  roots  by  G \  and  G2,  it  is  shown  in  Appendix  B,  page  84,  that 

=  and  q  =  VpMIfh 

1  2  2  2 


For  the  case  G\  *  G2,  it  can  be  shown  that  we  have 


P  k  = 


(1-G|)G^+1-(1-G2)G|+1 
(G1-G2)(1+G1G2) 


for  k>  0 


(4.3.16) 


If  the  roots  are  complex  (that  is,  if  4>2  +  4(j>2  <  0  ),  then  p/.  may  be  rewritten  as 

p  =  Rksm(®k  +  ®)  for  k  >  0  (4.3.17) 

k  sin(4>) 

where/?  =  J-  4>2  and  0  and  O  are  defined  by  cos(0)  =  and  tan(<t>)  = 

[(l  -4>2)/(  i+4)2)]  • 

For  completeness,  we  note  that  if  the  roots  are  equal  (4>2  +  44p  =  0 ),  then  we  have 


=  for  =  0,  1 ,  2..,  (4.3.18) 

A  good  discussion  of  the  derivations  of  these  formulas  can  be  found  in  Fuller  (1996, 
Section  2.5). 

The  specific  details  of  these  formulas  are  of  little  importance  to  us.  We  need  only 
note  that  the  autocorrelation  function  can  assume  a  wide  variety  of  shapes.  In  all  cases, 
the  magnitude  of  pk  dies  out  exponentially  fast  as  the  lag  k  increases.  In  the  case  of  com¬ 
plex  roots,  pj.  displays  a  damped  sine  wave  behavior  with  damping  factor  /?,  0  <  R  <  1 , 
frequency  0,  and  phase  ®.  Illustrations  of  the  possible  shapes  are  given  in  Exhibit 
4.18.  (The  R  function  ARMAacf  discussed  on  page  450  is  useful  for  plotting.) 
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Exhibit  4.18  Autocorrelation  Functions  for  Several  AR(2)  Models 


1  i  i  i  i  r  i  i  i  i  i  r 


2  4  6  8  10  12  2  4  6  8  10  12 


i  i  i  i  i  r  i  i  i  i  i  i  r 


2  4  6  8  10  12  2  4  6  8  10  12 

Lag  Lag 

Exhibit  4.19  displays  the  time  plot  of  a  simulated  AR(2)  series  with  c|)|  =  1 .5  and 
(j>2  =  -0.75.  The  periodic  behavior  of  p/.  shown  in  Exhibit  4.18  is  clearly  reflected  in  the 
nearly  periodic  behavior  of  the  series  with  the  same  period  of  360/30  =  12  time  units.  If 
©  is  measured  in  radians,  2ji /©  is  sometimes  called  the  quasi-period  of  the  AR(2)  pro¬ 


cess. 
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The  Variance  for  the  AR(2)  Model 

The  process  variance  y()  can  be  expressed  in  terms  of  the  model  parameters  ())j,  (jn,  and 
<j~  as  follows:  Taking  the  variance  of  both  sides  of  Equation  (4.3.9)  yields 

Y0  =  (<!>?  +  (1)2)Yo  +  ^tfzYi  +  (4.3.19) 


Setting  k  =  1  in  Equation  (4.3.12)  gives  a  second  linear  equation  for  yq  and  yj, 
y l  =  4>iYo  +  4gYi  »  which  can  be  solved  simultaneously  with  Equation  (4.3.19)  to 
obtain 


_ (1  ~  ^2)cye _ 

(i  -(MO  - 4>i  - 4>|) - 2<t)2<))? 

vi  +  (l  -  4>2)2  —  ())^ 


(4.3.20) 


The  ^-Coefficients  for  the  AR(2)  Model 


The  \|/-coefficients  in  the  general  linear  process  representation  for  an  AR(2)  series  are 
more  complex  than  for  the  AR(1)  case.  However,  we  can  substitute  the  general  linear 
process  representation  using  Equation  (4.1.1)  for  Yt,  for  Yt_  and  for  Yt_  2  into 
Yt  =  ([>  j  Yt  j  +  c))t  Yf  2  +  et .  If 'we  then  equate  coefficients  of  ej,  we  get  the  recursive 
relationships 


Vo 

Vi-<l>iVo 


1 

0 

0  for  j  =  2,  3, ... 


(4.3.21) 


These  may  be  solved  recursively  to  obtain  v|/0  =  1,  \|/j  =  ())j,  \|/9  =  4>y  +  (J^  ,  and  so  on. 
These  relationships  provide  excellent  numerical  solutions  for  the  \|/-coefficients  for 
given  numerical  values  of  (f>1  and  4»^. 

One  can  also  show  that,  for  G|  ^  G2,  an  explicit  solution  is 


= 


g{+1-gj2+1 

Gi-G2 


(4.3.22) 


where,  as  before,  G  |  and  G2  are  the  reciprocals  of  the  roots  of  the  AR  characteristic 
equation.  If  the  roots  are  complex,  Equation  (4.3.22)  may  be  rewritten  as 


Vj=R 


sin  [  C/  +  1)©1 1 
sin(0)  J 


(4.3.23) 


a  damped  sine  wave  with  the  same  damping  factor  R  and  frequency  0  as  in  Equation 
(4.3.17)  for  the  autocorrelation  function. 

For  completeness,  we  note  that  if  the  roots  are  equal,  then 

V/  =  (1  +7H1 


(4.3.24) 
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The  General  Autoregressive  Process 

Consider  now  the  /rth-order  autoregressive  model 


Yt  =  Wt-l  +  *2Yt-2+-+*pYt-p  +  et  (4.3.25) 

with  AR  characteristic  polynomial 

<()(jr)  =  1  —  4>  jjc  —  <\>^x2  -  •  •  •  -  <\>pxp  (4.3.26) 

and  corresponding  AR  characteristic  equation 

1  -(Jtjjc- (|>2.r2 - <\ )pxp  =  0  (4.3.27) 

As  noted  earlier,  assuming  that  et  is  independent  of  Yt_  j,  Yt_  2,  Yt_  3,...  a  station¬ 
ary  solution  to  Equation  {A3. 21)  exists  if  and  only  if  the  p  roots  of  the  AR  characteristic 
equation  each  exceed  1  in  absolute  value  (modulus).  Other  relationships  between  poly¬ 
nomial  roots  and  coefficients  may  be  used  to  show  that  the  following  two  inequalities 
are  necessary  for  stationarity.  That  is,  for  the  roots  to  be  greater  than  1  in  modulus,  it  is 
necessary,  but  not  sufficient,  that  both 


and 


TP 

<  1 


<  1 


(4.3.28) 


Assuming  stationarity  and  zero  means,  we  may  multiply  Equation  (4.3.25)  by  Yt  _  ^ 
take  expectations,  divide  by  y0,  and  obtain  the  important  recursive  relationship 


P k  ~  ‘Mit-l  +  (l)2P/t-2  +  (l)3PA-3  +  •••  +§pPk 


-P 


for  k  >  1 


(4.3.29) 


Putting  k  =  1,  2,...,  and  p  into  Equation  (4.3.29)  and  using  p0  =  1  and  p_^  =  p^,  we  get 
the  general  Yule- Walker  equations 


Pi  -  4>1  +  4>2Pl  +  (1)3P2  +  +  ^/jPp -  1 
P2  =  4>1  P 1  +  ‘b  +  ‘t’sPl  +  +  §pPp-2 

P  p  =  ‘t’lPp-l  +  <t>2Pp-2  +  (t)3Pp-3  +  +(l V. 


(4.3.30) 


Given  numerical  values  for  4q,  (j)2, ... ,  <\>f)J  these  linear  equations  can  be  solved  to 
obtain  numerical  values  for  pj,  p2,...,  pp.  Then  Equation  (4.3.29)  can  be  used  to  obtain 
numerical  values  for  p/.  at  any  number  of  higher  lags. 

Noting  that 

E(etYt )  =  E[et(^iYt_l+^2Yt_2+  ...+^pYt_p  +  et)]  =  E{ej)  =  ct2 


we  may  multiply  Equation  (4.3.25)  by  Yt,  take  expectations,  and  find 

Y0  =  ^lYl+^2Y2+-"+  V/>  +  °e 
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which,  using  p^.  =  v^/vq,  can  be  written  as 

ae 

Y0  =  r-* - 1“* - T -  (4-3.31) 

+2P2 - tpPp 

and  express  the  process  variance  y()  in  terms  of  the  parameters  ,  4>i,  4>2, ,  §p,  and  the 
now  known  values  of  pj,  p2,...,  P/r  Of  course,  explicit  solutions  for  p/(  are  essentially 
impossible  in  this  generality,  but  we  can  say  that  p^  will  be  a  linear  combination  of 
exponentially  decaying  terms  (corresponding  to  the  real  roots  of  the  characteristic  equa¬ 
tion)  and  damped  sine  wave  terms  (corresponding  to  the  complex  roots  of  the  character¬ 
istic  equation). 

Assuming  stationarity,  the  process  can  also  be  expressed  in  the  general  linear  pro¬ 
cess  form  of  Equation  (4.1.1),  but  the  \|/-coefficients  are  complicated  functions  of  the 
parameters  (jq,  4>2,--.,  <]>/r  The  coefficients  can  be  found  numerically;  see  Appendix  C  on 
page  85. 

4.4  The  Mixed  Autoregressive  Moving  Average  Model 


If  we  assume  that  the  series  is  partly  autoregressive  and  partly  moving  average,  we 
obtain  a  quite  general  time  series  model.  In  general,  if 

Yt  =  §\Yt-l  +  §2Yt-2  +  •••  +§pYt-p  +  et~®\et-\  -Q2et-2 

- Vf-9  (4A1) 

we  say  that  { Yt}  is  a  mixed  autoregressive  moving  average  process  of  orders  p  and  q, 
respectively;  we  abbreviate  the  name  to  ARMA(/?,g).  As  usual,  we  discuss  an  important 
special  case  first. ' 

The  ARMA(1,1)  Model 


The  defining  equation  can  be  written 

Yt  =  *Yt-i+et~Qet-l 

To  derive  Yule-Walker  type  equations,  we  first  note  that 

E(etYt)  =  E[et($Yt_l+et-eet_1)] 
= 


(4.4.2) 


and 


'  In  mixed  models,  we  assume  that  there  are  no  common  factors  in  the  autoregressive  and 
moving  average  polynomials.  If  there  were,  we  could  cancel  them  and  the  model  would 
reduce  to  an  ARMA  model  of  lower  order.  For  ARMA(1,1),  this  means  0  ^  <|>. 
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E(et_lYt)  =  E\-et-MYt-i  +  ef-0et_1)] 

=  <K2  - e°2 

=  (4>-0)°2 


If  we  multiply  Equation  (4.4.2)  by  and  take  expectations,  we  have 
Y0  =  ‘l'Yi  +  U-Q^-0)]^2' 

Yl=^Yo-0^2 

Y*  =  <l>Y*-i  for  k>2 

Solving  the  first  two  equations  yields 


Yo  = 


(1 


2«l>e  +  92)-2 
l-4>2 


(4.4.3) 


(4.4.4) 


and  solving  the  simple  recursion  gives 


Pfc  = 


(l  — 040(4)  — 0)  .t_i 
i-29(|)  +  e2  v 


for  k  >  1 


(4.4.5) 


Note  that  this  autocorrelation  function  decays  exponentially  as  the  lag  k  increases. 
The  damping  factor  is  4>,  but  the  decay  starts  from  initial  value  p  | ,  which  also  depends 
on  0.  This  is  in  contrast  to  the  AR(1)  autocorrelation,  which  also  decays  with  damping 
factor  4>  but  always  from  initial  value  p0  =  1.  For  example,  if  4>  =  0.8  and  0  =  0.4,  then 
Pl  =  0.523,  p2  =  0.418,  pj  =  0.335,  and  so  on.  Several  shapes  for  p^,  are  possible, 
depending  on  the  sign  of  pj  and  the  sign  of  cj). 

The  general  linear  process  form  of  the  model  can  be  obtained  in  the  same  manner 
that  led  to  Equation  (4.3.8).  We  find 


Yt  =  et  +  (4>-0)fy“V;.  ,  (4.4.6) 

7=1 

that  is, 

\|/;.  =  (<t>  —  0)<1)7^ 1  for  y  >  1 

We  should  now  mention  the  obvious  stationarity  condition  |4>|  <  1,  or  equivalently 
the  root  of  the  AR  characteristic  equation  1  —  (j).v  =  0  must  exceed  unity  in  absolute 
value. 

For  the  general  ARMA(/?,^r)  model,  we  state  the  following  facts  without  proof: 
Subject  to  the  condition  that  et  is  independent  of  Ft_j,  Yt_  2,  Tf_3,...,  a  stationary  solu¬ 
tion  to  Equation  (4.4.1)  exists  if  and  only  if  all  the  roots  of  the  AR  characteristic  equa¬ 
tion  (j)(.v)  =  0  exceed  unity  in  modulus. 

If  the  stationarity  conditions  are  satisfied,  then  the  model  can  also  be  written  as  a 
general  linear  process  with  ^-coefficients  determined  from 


4.5  Invertibility 
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Vo  =  1 

Vi  =  -01+4*1 

V2  =  -02  +  ^2  +  (^lVi 

Vj  =  - Qj +  ^j-P  +  bp-W-p* i  +  --  +  0i'-'/  , 


(4.4.7) 


where  we  take  v(/y  =  0  for  j  <  0  and  0^  =  0  for  j  >  q. 

Again  assuming  stationarity,  the  autocorrelation  function  can  easily  be  shown  to 
satisfy 

PA  =  4»iP*-i  +§2Pk-2+  ”•  +^PPk-p  for  k>  q  (4.4.8) 


Similar  equations  can  be  developed  for  k  =  1,  2,  3,...,  q  that  involve  01?  0t,...,  0(/.  An 
algorithm  suitable  for  numerical  computation  of  the  complete  autocorrelation  function 
is  given  in  Appendix  C  on  page  85.  (This  algorithm  is  implemented  in  the  R  function 
named  ARMAacf .) 


4.5  Invertibility 


We  have  seen  that  for  the  MA(1)  process  we  get  exactly  the  same  autocorrelation  func¬ 
tion  if  0  is  replaced  by  1/0.  In  the  exercises,  we  find  a  similar  problem  with  nonunique¬ 
ness  for  the  MA(2)  model.  This  lack  of  uniqueness  of  MA  models,  given  their 
autocorrelation  functions,  must  be  addressed  before  we  try  to  infer  the  values  of  param¬ 
eters  from  observed  time  series.  It  turns  out  that  this  nonuniqueness  is  related  to  the 
seemingly  unrelated  question  stated  next. 

An  autoregressive  process  can  always  be  reexpressed  as  a  general  linear  process 
through  the  \|/-coefficients  so  that  an  AR  process  may  also  be  thought  of  as  an  infi¬ 
nite-order  moving  average  process.  However,  for  some  purposes,  the  autoregressive  rep¬ 
resentations  are  also  convenient.  Can  a  moving  average  model  be  reexpressed  as  an 
autoregression? 

To  fix  ideas,  consider  an  MA(1)  model: 

Yt  =  et-i)et_A  (4.5.1) 

First  rewriting  this  as  et=Yt+  Qet_\  and  then  replacing  t  by  t  -  1  and  substituting  for 
et  _  ]  above,  we  get 

=  Yt  +  ®(Yt~l  +  Qet-2) 

=  F,  +  0Tf_1+0V2 

If  |0|  <  1,  we  may  continue  this  substitution  “infinitely”  into  the  past  and  obtain  the 
expression  [compare  with  Equations  (4.3.7)  and  (4.3.8)] 

=  F,+  0T;_1+02Tf_2+... 
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or 

V,  =  (—  QY^l-Q%_2-Q%_3--)  +  et  (4.5.2) 

If  |0|  <  1,  we  see  that  the  MA(1)  model  can  be  inverted  into  an  infinite-order  autoregres¬ 
sive  model.  We  say  that  the  MA(1)  model  is  invertible  if  and  only  if  |0|  <  1. 

For  a  general  MA(g)  or  ARMA(y;,^)  model,  we  define  the  MA  characteristic 
polynomial  as 

0(x)  =  1  -  0jX  -  e2Jt2  -  03X3 - (4.5.3) 

and  the  corresponding  MA  characteristic  equation 

1  -  Oj.v:  -  02.r2  -  03x3  -  •••  ~QqXq  =  0  (4.5.4) 

It  can  be  shown  that  the  MAC/)  model  is  invertible;  that  is,  there  are  coefficients  Kj 
such  that 

Yt  =  7l\Yt-l+n2Y,-2  +  K3Yt-3  +  ■■■  +et  ^4-5-5) 

if  and  only  if  the  roots  of  the  MA  characteristic  equation  exceed  1  in  modulus.  (Com¬ 
pare  this  with  stationarity  of  an  AR  model.) 

It  may  also  be  shown  that  there  is  only  one  set  of  parameter  values  that  yield  an 
invertible  MA  process  with  a  given  autocorrelation  function.  For  example,  Y,  = 
e,  +  2et  _  i  and  Yt  =  et  +  Viet  _  j  both  have  the  same  autocorrelation  function,  but  only  the 
second  one  with  root  -2  is  invertible.  From  here  on,  we  will  restrict  our  attention  to  the 
physically  sensible  class  of  invertible  models. 

For  a  general  ARMA(p,ty)  model,  we  require  both  stationarity  and  invertibility. 

4.6  Summary 


This  chapter  introduces  the  simple  but  very  useful  autoregressive,  moving  average 
(ARMA)  time  series  models.  The  basic  statistical  properties  of  these  models  were 
derived  in  particular  for  the  important  special  cases  of  moving  averages  of  orders  1  and 
2  and  autoregressive  processes  of  orders  1  and  2.  Stationarity  and  invertibility  issues 
have  been  pursued  for  these  cases.  Properties  of  mixed  ARMA  models  have  also  been 
investigated.  You  should  be  well-versed  in  the  autocorrelation  properties  of  these  mod¬ 
els  and  the  various  representations  of  the  models. 
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Exercises 


4.1  Use  first  principles  to  find  the  autocorrelation  function  for  the  stationary  process 
defined  by 

Yt  =  5  +  e,~\et-\  +  \et-2 

4.2  Sketch  the  autocorrelation  functions  for  the  following  MA(2)  models  with  param¬ 
eters  as  specified: 

(a)  0,  =  0.5  and  9-,  =  0.4. 

(b)  0 j  =  1.2  and  02  =  -0.7. 

(c)  9t  =  -1  and  02  =  -0.6. 

4.3  Verify  that  for  an  MA(1)  process 

max  pj  =  0.5  and  min  pj  =  -0.5 

—00  <  0  <  OO  —00  <  0  <  00 

4.4  Show  that  when  9  is  replaced  by  1/0,  the  autocorrelation  function  for  an  MA(1) 
process  does  not  change. 

4.5  Calculate  and  sketch  the  autocorrelation  functions  for  each  of  the  following 
AR(1)  models.  Plot  for  sufficient  lags  that  the  autocorrelation  function  has  nearly 
died  out. 

(a)  ^  =  0.6. 

(b)  ()>,  =  -0.6. 

(c)  (li!  =  0.95.  (Do  out  to  20  lags.) 

(dH,=0.3. 

4.6  Suppose  that  [Yt]  is  an  AR(1)  process  with  -1  <  4>  <  +1. 

(a)  Find  the  autocovariance  function  for  Wt  =  VYt  =  Yt—  Yt_ j  in  terms  of  4>  and 

4 

(b)  In  particular,  show  that  Var(Wt)  =  2ct^  /(1-h}>). 

4.7  Describe  the  important  characteristics  of  the  autocorrelation  function  for  the  fol¬ 
lowing  models:  (a)  MA(1),  (b)  MA(2),  (c)  AR(1),  (d)  AR(2),  and  (e)  ARMA(l.l). 

4.8  Let  { Yt}  be  an  AR(2)  process  of  the  special  form  Yt  -  (|)2f;  _  2  +  et.  Use  first  prin¬ 
ciples  to  find  the  range  of  values  of  4>0  for  which  the  process  is  stationary. 

4.9  Use  the  recursive  formula  of  Equation  (4.3.13)  to  calculate  and  then  sketch  the 
autocorrelation  functions  for  the  following  AR(2)  models  with  parameters  as 
specified.  In  each  case,  specify  whether  the  roots  of  the  characteristic  equation  are 
real  or  complex.  If  the  roots  are  complex,  find  the  damping  factor,  R,  and  fre¬ 
quency,  0,  for  the  corresponding  autocorrelation  function  when  expressed  as  in 
Equation  (4.3.17),  on  page  73. 

(a)  (li!  =  0.6  and  c|)-,  =  0.3. 

(b)  4>!  =  -0.4  and  (j>2  =  0.5. 

(c)  4>i  =  1.2  and  <|)-,  =  -0.7. 

(d)  4*!  =  -1  and  c))2  =  -0.6. 

(e)  ())]  =  0.5  and  c|)2  =  -0.9. 

(f)  (f)  j  =  -0.5  and  c))2  =  -0.6. 
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4.10  Sketch  the  autocorrelation  functions  for  each  of  the  following  ARMA  models: 

(a)  ARMA(l.l)  with  <)>  =  0.7  and  0  =  0.4. 

(b)  ARMA(l.l)  with  <|>  =  0.7  and  6  =  -0.4. 

4.11  For  the  ARMA(1,2)  model  F,  =  0.8F,  _  ]  +  e,  +  0.7  e,  j  +  0.6et_2,  show  that 

(a)  p^.  =  0.8p£_4  for  k  >  2. 

(b)  p2  =  0.8p1  +0.6o;/y0. 

4.12  Consider  two  MA(2)  processes,  one  with  0j  =  02  =  1/6  and  another  with  0j  =  -1 
and  02  =  6. 

(a)  Show  that  these  processes  have  the  same  autocorrelation  function. 

(b)  Flow  do  the  roots  of  the  corresponding  characteristic  polynomials  compare? 

4.13  Let  [  Yt }  be  a  stationary  process  with  p,(  =  0  for  k  >  1.  Show  that  we  must  have 

|p  j  |  <  Vi.  (Hint:  Consider  Var(Yn  +  i  +  Yn-\ - t-  Fj)  and  then  Var(Yn  +  j  -  Yn  + 

Yn_  j  —  •  •  •  ±  Fj).  Use  the  fact  that  both  of  these  must  be  nonnegative  for  all  n .) 

4.14  Suppose  that  { Yt}  is  a  zero  mean,  stationary  process  with  |pj|  <  0.5  and  p^.  =  0  for 
k>  1.  Show  that  {  Y, }  must  be  representable  as  an  MA(1)  process.  That  is,  show 
that  there  is  a  white  noise  sequence  { et  |  such  that  F,  =  et  -  0<?,_  where  pj  is  cor¬ 
rect  and  et  is  uncorrelated  with  Yt_k  for  k  >  0.  (Hint:  Choose  0  such  that  |0|  <  1 
and  pj  =  -0/(1  +  02);  then  let  et  =  '^7  _  0  2  F,  _  j .  If  we  assume  that  { Yt}  is  a  nor¬ 
mal  process,  et  will  also  be  normal,  and  zero  correlation  is  equivalent  to  indepen¬ 
dence.) 

4.15  Consider  the  AR(1)  model  Yt  =  4>F,  _  j  +  et.  Show  that  if  |(|>|  =  1  the  process  cannot 
be  stationary.  (Hint:  Take  variances  of  both  sides.) 

4.16  Consider  the  "nonstationary”  AR(1 )  model  Y,  =  3  F,  |  +  et. 

(a)  Show  that  Ff  =  ]  {\yet+j  satisfies  the  AR(1)  equation. 

(b)  Show  that  the  process  defined  in  part  (a)  is  stationary. 

(c)  In  what  way  is  this  solution  unsatisfactory? 

4.17  Consider  a  process  that  satisfies  the  AR(1)  equation  Y,  =  ViYt  \  +  et. 

(a)  Show  that  Yt  =  10(1/2),  +  e,  +  Viet  _  j  +  (V2)2et  _  2  H —  is  a  solution  of  the  AR(  1 ) 
equation. 

(b)  Is  the  solution  given  in  part  (a)  stationary? 

4.18  Consider  a  process  that  satisfies  the  zero-mean,  “stationary”  AR(1)  equation  Yt  = 
(j)  Y{  „  |  +  et  with  -1  <  (|>  <  +1.  Let  c  be  any  nonzero  constant,  and  define  IF,  =  F,  + 
c(|)f. 

(a)  Show  that  E(Wt )  =  c())f. 

(b)  Show  that  { Wt}  satisfies  the  “stationary”  AR(1)  equation  Wt  =  c|> IF,  _  j  +  e,. 

(c)  Is  {IF,}  stationary? 

4.19  Consider  an  MA(6)  model  with  0,  =  0.5,  02  =  -0.25,  03  =  0.125,  04  =  -0.0625, 
05  =  0.03125,  and  06  =  -0.015625.  Find  a  much  simpler  model  that  has  nearly  the 
same  \|/-weights. 

4.20  Consider  an  MA(7)  model  with  0j  =  1,  02  =  -0.5,  03  =  0.25,  04  =  -0.125, 
05  =  0.0625,  06  =  -0.03125,  and  07  =  0.015625.  Find  a  much  simpler  model  that 
has  nearly  the  same  \|/- weights. 
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4.21  Consider  the  model  Yt  =  et  _  y  -  et  _  2  +  0.5ef  _  3. 

(a)  Find  the  autocovariance  function  for  this  process. 

(b)  Show  that  this  is  a  certain  ARMAfp//)  process  in  disguise.  That  is,  identify 
values  for  p  and  q  and  for  the  9’s  and  <|>’s  such  that  the  ARMAtp.r/)  process 
has  the  same  statistical  properties  as  {F,}. 

4.22  Show  that  the  statement  “The  roots  of  1  -  (jtjX -  (J)^jc2  -  •••  -  §pxP  =  0  are 
greater  than  1  in  absolute  value’’  is  equivalent  to  the  statement  “The  roots  of 
xp  -  §yXp  ~  1  -  §2xP  ~  -  •  •  •  -  §p  =  0  are  less  than  1  in  absolute  value.”  (Hint:  If 
G  is  a  root  of  one  equation,  is  1/G  a  root  of  the  other?) 

4.23  Suppose  that  {Yr}  is  an  AR(1)  process  with  pj  =  (j).  Define  the  sequence  {bt}  as 
bt=Yt-*Y,+ 1. 

(a)  Show  that  Cov(bpbt  _*)  =  0  for  all  t  and  k. 

(b)  Show  that  Cov(bt,  Yt  +  k)  =  0  for  all  t  and  k  >  0. 

4.24  Let  {et}  be  a  zero-mean,  unit-variance  white  noise  process.  Consider  a  process 
that  begins  at  time  t  =  0  and  is  defined  recursively  as  follows.  Let  Y{)  =  c ye0  and 
Ft  =  c2Yq  +  <q.  Then  let  Yt  =  (JqFy  _  j  +  §oYt  _  2  +  et  for  t  >  1  as  in  an  AR(2)  pro¬ 
cess. 

(a)  Show  that  the  process  mean  is  zero. 

(b)  For  particular  values  of  t)>j  and  c|)2  within  the  stationarity  region  for  an  AR(2) 
model,  show  how  to  choose  C|  and  c2  so  that  both  Var(Y0)  =  Var(Y{)  and  the 
lag  1  autocorrelation  between  Yy  and  F0  match  that  of  a  stationary  AR(2)  pro¬ 
cess  with  parameters  ((q  and  (Jq. 

(c)  Once  the  process  { Yt}  is  generated,  show  how  to  transform  it  to  a  new  process 
that  has  any  desired  mean  and  variance.  (This  exercise  suggests  a  convenient 
method  for  simulating  stationary  AR(2)  processes.) 

4.25  Consider  an  “AR(1)”  process  satisfying  Yt  =  §Yt_  y  +  et,  where  4>  can  be  any  num¬ 
ber  and  {et}  is  a  white  noise  process  such  that  e,  is  independent  of  the  past  {Yt_y, 
Yt  _  2, . . . } .  Let  F0  be  a  random  variable  with  mean  and  variance  <Jq  . 

(a)  Show  that  for  t  >  0  we  can  write 

Yt=  et  +  §et-  1  +  §~et-  2  +  ^et-3  +  -"+  ^  1(?i  +  VYq. 


(b)  Show  that  for  t  >  0  we  have  E{Yt)  =  ())Vo- 

(c)  Show  that  for  t  >  0 


Var(Yt )  = 


\  Ge  +  (l)2fCT0  for(^^  1 

1  -  q)z 

ta g  +  C7q  for  4>  =  1 


(d)  Suppose  now  that  po  =  0.  Argue  that,  if  { Ff}  is  stationary,  we  must  have  <|>  ^  1. 

(e)  Continuing  to  suppose  that  p()  =  0,  show  that,  if  {Yt}  is  stationary,  then 
Var(Yf)  =  <J^/(  1  -  (fr)  and  so  we  must  have  |4>|  <1. 
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Appendix  B:  The  Stationarity  Region  for  an  AR(2)  Process 


In  the  second-order  case,  the  roots  of  the  quadratic  characteristic  polynomial  are  easily 
found  to  be 


-2<t>2 


(4-B.l) 


For  stationarity  we  require  that  these  roots  exceed  1  in  absolute  value.  We  now 
show  that  this  will  be  true  if  and  only  if  three  conditions  are  satisfied: 


+  (|)9  <  1 , 


[ij  <  1,  and  4*-,  c  1 


(4.B.2) 


Proof:  Let  the  reciprocals  of  the  roots  be  denoted  Gj  and  GY  Then 

2<t>o  2<|)2  1-4-1  + 


G  i  = 


-4>1-74>?+4(i>2 

_  2c|>2(—  (|>1  +  J<bi  +  4<b2)  _  +  4^2 


4>1  -(<t>i  +4<t>2) 


Similarly, 


G2  = 


^t +  a +  44»2 


We  now  divide  the  proof  into  two  cases  corresponding  to  real  and  complex  roots. 
The  roots  will  be  real  if  and  only  if  4>p  -I-  4(j)-,  >  0  . 

I.  Real  Roots:  I G  I  <  1  for  i  =  1  and  2  if  and  only  if 


,  4>t-  7^1  +  44>2  4>1  +  V4)i +44»2  , 

-1  < - —z - < - - <  1 


or 


if  and  only  if 


-2  <  <t>!  -  M  +  4c))2  <()>!  +  l>i  +  4<|)2  <  2  . 

Consider  just  the  first  inequality.  Now 
i2  +  4(|)9  <  <(>  1  +  2  if  and  only  if  (|)j  +  4(f>2  <  (j) 2  +  4(j)1  +  4  if  and  only  if  (J>2  <  c|)j  +  1 , 
or  (j)9  -  (j)j  <  1. 

The  inequality  4>  i  +  +  4(f)-,  <  2  is  treated  similarly  and  leads  to  <|>9  +  (|>j  <  1. 

These  equations  together  with  4>  j  +  4(f)-,  >  0  define  the  stationarity  region  for  the 
real  root  case  shown  in  Exhibit  4.17. 

II.  Complex  Roots:  Now  cf> p  +  4cf>9  <  0  .  Here  G]  and  G2  will  be  complex  conju¬ 
gates  and  |Gj|  =  |G2|  <  1  if  and  only  if  |Gj|2  <  1.  ButjGj  2  =  [(f)2  +  (-  (f>  j  -  4(|)2)]/4 
=  -(|)2  so  that  (f>9  >  -1  .  This  together  with  the  inequality  <f)p  -t-  4(J>2  <  0  defines  the  part 
of  the  stationarity  region  for  complex  roots  shown  in  Exhibit  4.17  and  establishes  Equa¬ 
tion  (4.3.1 1).  This  completes  the  proof. 
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Appendix  C:  The  Autocorrelation  Function  for  ARMA(p,g) 


Let  { Tf}  be  a  stationary,  invertible  ARMA (p,q)  process.  Recall  that  we  can  always  write 
such  a  process  in  general  linear  process  form  as 

00 

=  I  (4-cl) 

7  =  0 

where  the  \|/-weights  can  be  obtained  recursively  from  Equations  (4.4.7),  on  page  79. 
We  then  have 


E(Yt  +  ket)  =  E 


Z  xlie< 

7  =  0 


iKt+k-jet 


=  for  k  >  0 


(4.C.2) 


Thus  the  autocovariance  must  satisfy 


=  £(W)  =  E 


p 

I 

7=1 


Z  ^jYt  +  k-j  Z  ®jet  +  k-j 


A 


7  =  0 


(4.C.3) 


7=1 


j  =  k 


where  90  =  -1  and  the  last  sum  is  absent  if  k  >  q.  Setting  k  =  0,  1,  ....  p  and  using  y_k  = 
yk  leads  to p  +  1  linear  equations  in  y0,  yj,  . . . ,  yp. 

Y0  =  Vl+V2+-"  +  Vp_a^0O  +  0l^l  +  '"  +  0^) 

Yi  =  Vo  +  Vi  +  "-  +  Vp-i“^2(0i  +  02Vi  +  '"  +  0^-i)  (4C4) 

Yp  =  Vp  -  1  +  Vp  -  2  +  •  •  ■ •  +  Vo  -  Vep  +  ep  +  lVi  +  ■  ■ ■  ■ ■  +  VVp)  - 

where  0y  =  0  if  y  >  q. 

For  a  given  set  of  parameter  values  oy  ,  <))  ’s,  and  9’s  (and  hence  \|/’s),  we  can  solve 
the  linear  equations  to  obtain  yg,  yi, _ ,  yp.  The  values  of  yk  for  k  >  p  can  then  be  evalu¬ 

ated  from  the  recursion  in  Equations  (4.4.8),  on  page  79.  Finally,  p^.  is  obtained  from  p/{ 
=  Ya/Yo- 


Chapter  5 


Models  for  Nonstationary  Time 
Series 


Any  time  series  without  a  constant  mean  over  time  is  nonstationary.  Models  of  the  form 

Yt=\it  +  X t 

where  |if  is  a  nonconstant  mean  function  and  Xt  is  a  zero-mean,  stationary  series,  were 
considered  in  Chapter  3.  As  stated  there,  such  models  are  reasonable  only  if  there  are 
good  reasons  for  believing  that  the  deterministic  trend  is  appropriate  “forever.”  That  is, 
just  because  a  segment  of  the  series  looks  like  it  is  increasing  (or  decreasing)  approxi¬ 
mately  linearly,  do  we  believe  that  the  linearity  is  intrinsic  to  the  process  and  will  persist 
in  the  future?  Frequently  in  applications,  particularly  in  business  and  economics,  we 
cannot  legitimately  assume  a  deterministic  trend.  Recall  the  random  walk  displayed  in 
Exhibit  2.1,  on  page  14.  The  time  series  appears  to  have  a  strong  upward  trend  that 
might  be  linear  in  time.  However,  also  recall  that  the  random  walk  process  has  a  con¬ 
stant,  zero  mean  and  contains  no  deterministic  trend  at  all. 

As  an  example  consider  the  monthly  price  of  a  barrel  of  crude  oil  from  January 
1986  through  January  2006.  Exhibit  5.1  displays  the  time  series  plot.  The  series  displays 
considerable  variation,  especially  since  2001,  and  a  stationary  model  does  not  seem  to 
be  reasonable.  We  will  discover  in  Chapters  6,  7,  and  8  that  no  deterministic  trend 
model  works  well  for  this  series  but  one  of  the  nonstationary  models  that  have  been 
described  as  containing  stochastic  trends  does  seem  reasonable.  This  chapter  discusses 
such  models.  Fortunately,  as  we  shall  see,  many  stochastic  trends  can  be  modeled  with 
relatively  few  parameters. 


87 


88 


Models  for  Nonstationary  Time  Series 


Exhibit  5.1  Monthly  Price  of  Oil:  January  1986-January  2006 


>  win . graph (width=4 . 875 , height =3 , pointsize=8 ) 

>  data (oil .price) 

>  plot (oil .price,  ylab= ' Price  per  Barrel 1 , type= ' 1 ' ) 


5.1  Stationarity  Through  Differencing 


Consider  again  the  AR(1)  model 

Y,  =  *Y,-l  +  e,  (5.1.1) 

We  have  seen  that  assuming  et  is  a  true  “innovation”  (that  is,  et  is  uncorrelated  with 
Yt_  j,  Yt_  2,...),  we  must  have  |c|>|  <  1.  What  can  we  say  about  solutions  to  Equation 
(5.1.1)  if  M>|  >  1  ?  Consider  in  particular  the  equation 

Yt=3Yt_l+et  (5.1.2) 

Iterating  into  the  past  as  we  have  done  before  yields 

Yf  =  et  +  3et^l+31et_2+  •  •  ■  +  3f~  1el  +  3%  (5.1.3) 

We  see  that  the  influence  of  distant  past  values  of  Y,  and  et  does  not  die  out — indeed, 
the  weights  applied  to  Y{)  and  e\  grow  exponentially  large.  In  Exhibit  5.2,  we  show  the 
values  for  a  very  short  simulation  of  such  a  series.  Here  the  white  noise  sequence  was 
generated  as  standard  normal  variables  and  we  used  E0  =  0  as  an  initial  condition. 


Exhibit  5.2 

Simulation  of  the  Explosive 

“AR(1)  Model” 

cn 

II 

kN 

-1  +et 

f  i 

2 

3 

4 

5 

6 

7 

8 

et  0.63 

-1.25 

1.80 

1.51 

1.56 

0.62 

0.64 

-0.98 

Y,  0.63 

0.64 

3.72 

12.67 

39.57 

119.33 

358.63 

1074.91 
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Exhibit  5.3  shows  the  time  series  plot  of  this  explosive  ARfl)  simulation. 


Exhibit  5.3  An  Explosive  “AR(1)”  Series 


>  data (explode . s) 

>  plot (explode . s , ylab=expression ( Y [t] ) , type= 'o') 


The  explosive  behavior  of  such  a  model  is  also  reflected  in  the  model’s  variance 
and  covariance  functions.  These  are  easily  found  to  be 

Var(Yt)  =  g(9?-  l)o^  (5.1.4) 

and 

Cov(Yt,Yt_k)  =  |V-*-l)o2  (5-1.5) 

respectively.  Notice  that  we  have 

.  Igt-k  _  i 

Corr(Y  ,Y  ,)  =  3*  I - «1  for  large  t  and  moderate  k 

A/  9>-l 

The  same  general  exponential  growth  or  explosive  behavior  will  occur  for  any  4> 
such  that  |(|)|  >  1.  A  more  reasonable  type  of  nonstationarity  obtains  when  4>  =  I .  If  <))  =  1, 
the  AR(1)  model  equation  is 

Yt=Yf  +  et  (5.1.6) 

This  is  the  relationship  satisfied  by  the  random  walk  process  of  Chapter  2  (Equation 
(2.2.9)  on  page  12).  Alternatively,  we  can  rewrite  this  as 


(5.1.7) 
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where  V Y t  =  Ff  -  Ff  j  is  the  first  difference  of  Yt.  The  random  walk  then  is  easily 
extended  to  a  more  general  model  whose  first  difference  is  some  stationary  pro¬ 
cess — not  just  white  noise. 

Several  somewhat  different  sets  of  assumptions  can  lead  to  models  whose  first  dif¬ 
ference  is  a  stationary  process.  Suppose 

Yf  =  Mt  +  Xf  (5.1.8) 

where  Mt  is  a  series  that  is  changing  only  slowly  over  time.  Here  Mt  could  be  either 
deterministic  or  stochastic.  If  we  assume  that  Mt  is  approximately  constant  over  every 
two  consecutive  time  points,  we  might  estimate  (predict)  Mt  at  t  by  choosing  (i()  so  that 

£(rf_,-P(u)2 

j  =  o 

is  minimized.  This  clearly  leads  to 

A  =  \(Yt+Y[_l) 

and  the  “detrended”  series  at  time  1  is  then 

VA  =  y,-\(Y,+  Yt_  i)  =  =  \vYt 

This  is  a  constant  multiple  of  the  first  difference,  VFt.' 

A  second  set  of  assumptions  might  be  that  Mt  in  Equation  (5.1.8)  is  stochastic  and 
changes  slowly  over  time  governed  by  a  random  walk  model.  Suppose,  for  example,  that 

Yt  =  Mt  +  et  with  Mf  =  M{_  l+ef  (5.1.9) 

where  {et}  and  { et }  are  independent  white  noise  series.  Then 

VF,  =  VMr  +  Vet 

=  £t  +  et-et_l 

which  would  have  the  autocorrelation  function  of  an  MA(1)  series  with 

Pl  =  -{l/[2  +  (a£2/a2)]}  (5.1.10) 

In  either  of  these  situations,  we  are  led  to  the  study  of  VFf  as  a  stationary  process. 

Returning  to  the  oil  price  time  series,  Exhibit  5.4  displays  the  time  series  plot  of  the 
differences  of  logarithms  of  that  series.4.  The  differenced  series  looks  much  more  sta¬ 
tionary  when  compared  with  the  original  time  series  shown  in  Exhibit  5.1,  on  page  88. 


1  A  more  complete  labeling  of  this  difference  would  be  that  it  is  a  first  difference  at  lag  1. 

4  In  Section  5.4  on  page  98  we  will  see  why  logarithms  are  often  a  convenient  transforma¬ 
tion. 
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(We  will  also  see  later  that  there  are  outliers  in  this  series  that  need  to  be  considered  to 
produce  an  adequate  model.) 


Exhibit  5.4  The  Difference  Series  of  the  Logs  of  the  Oil  Price  Time 


Time 

>  plot (diff ( log (oil . price) ), ylab= ' Change  in  Log (Price) type= ' 1 1 ) 


We  can  also  make  assumptions  that  lead  to  stationary  second-difference  models. 
Again  we  assume  that  Equation  (5.1.8)  on  page  90,  holds,  but  now  assume  that  Mt  is  lin¬ 
ear  in  time  over  three  consecutive  time  points.  We  can  now  estimate  (predict)  Mt  at  the 
middle  time  point  t  by  choosing  P()  (and  Pj  f  to  minimize 

I  (^-7-(Po,r  +  ./Pl,r))2 
./  =  -! 

The  solution  yields 

4  =  l(Yt+l  +  Yt+Yt- 1) 


and  thus  the  detrended  series  is 


Yt-Mt 


Yt  - 


Y,+  l  +  Yt+Yr-l 


=  (4)V(Vy?+l) 

=  (4)v2(rf+i) 


a  constant  multiple  of  the  centered  second  difference  of  Yt.  Notice  that  we  have  differ¬ 
enced  twice,  but  both  differences  are  at  lag  1 . 

Alternatively,  we  might  assume  that 


92 


Models  for  Nonstationary  Time  Series 


Yt  =  Mf  +  et,  where  Mf  =  Mf_l  +  Wt  and  Wf  =  Wf_l  +  s?  (5.1.11) 

with  {et}  and  { st }  independent  white  noise  time  series.  Here  the  stochastic  trend  Mt  is 
such  that  its  “rate  of  change,”  VMf,  is  changing  slowly  over  time.  Then 

VYt  =  S7Mt  +  Vet  =  Wt  +  Vef 

and 

V2Yt  =  VW(  +  V2e( 

=  zt  +  et-2et_x+et_2 

which  has  the  autocorrelation  function  of  an  MA(2)  process.  The  important  point  is  that 
the  second  difference  of  the  nonstationary  process  {Yt}  is  stationary.  This  leads  us  to  the 
general  definition  of  the  important  integrated  autoregressive  moving  average  time  series 
models. 

5.2  ARIMA  Models 


A  time  series  {Yt}  is  said  to  follow  an  integrated  autoregressive  moving  average 
model  if  the  i/th  difference  Wt  =  WdYt  is  a  stationary  ARMA  process.  If  {Wt}  follows  an 
ARMA(/j,y)  model,  we  say  that  {Yt}  is  an  ARIMA(/?,<f,<7)  process.  Fortunately,  for 
practical  purposes,  we  can  usually  take  d  =  I  or  at  most  2. 

Consider  then  an  ARIMA(/?,  \  ,q)  process.  With  Wt  =  Yt  -  Yt  _  j,  we  have 


W,  =  ^Wt_x+^2Wt_2+--+^pWt_ 


,p  +  et-Qlet_1-Q  2e 
- 0.. 


(5.2.1) 


or,  in  terms  of  the  observed  series, 


Y,~Yt- t  =  4>l(Y,-l-Yt-2)  +  UY,-2-Y,-3)+-  +  4>p(Yt-p-Yt-p-d 

+  et-Qlet-l~Q2e,-2 - Vf-9 

which  we  may  rewrite  as 

Yf  =  (l+(|)1)Ft_1  +  (<|)2  -§x)Yt_2  +  (4>3  -  (t)2)  Tf_3  +  ••• 

(5.2.2) 

+  ^p-^P-l)Yt-p-^PYt-P-l  +  et-e\et-l~e2et-2 - Vf-9 

We  call  this  the  difference  equation  form  of  the  model.  Notice  that  it  appears  to  be  an 
ARMA(/?  +  \ .(/)  process.  However,  the  characteristic  polynomial  satisfies 

1  -  (1  +  <t>i)*  -  (^2  -  ^t)*2  -  (‘h  -  - (<l V  -  <bp-  0xP  +  <bpxP  +  1 

=  (1  -  (j)jX  -  ())9X2  -  ...  -  §pxp)(  1  -  x) 
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which  can  be  easily  checked.  This  factorization  clearly  shows  the  root  at  x  =  1 ,  which 
implies  nonstationarity.  The  remaining  roots,  however,  are  the  roots  of  the  characteristic 
polynomial  of  the  stationary  process  VYt. 

Explicit  representations  of  the  observed  series  in  terms  of  either  W,  or  the  white 
noise  series  underlying  W,  are  more  difficult  than  in  the  stationary  case.  Since  nonsta¬ 
tionary  processes  are  not  in  statistical  equilibrium,  we  cannot  assume  that  they  go  infi¬ 
nitely  into  the  past  or  that  they  start  at  t  =  -oo  .  However,  we  can  and  shall  assume  that 
they  start  at  some  time  point  t  =  -m ,  say,  where  -m  is  earlier  than  time  t  =  1 ,  at  which 
point  we  first  observed  the  series.  For  convenience,  we  take  Y,  =  0  for  t  <  -m.  The  differ¬ 
ence  equation  Yt  -  Yt  _  i  =  Wt  can  be  solved  by  summing  both  sides  from  t  =  -m  to  t  = 
t  to  get  the  representation 

Yt  =  £  Wj  (5.2.3) 

j  =  -m 

for  the  ARIMA(p,  I  ,q)  process. 

The  ARIMA(p,2,<7)  process  can  be  dealt  with  similarly  by  summing  twice  to  get  the 
representations 

Yt  =  £  t  wi 

j  =  -m  i  =  -m 

t  +  m  (5.2.4) 

=  ic i+mt-j 

j  =  0 

These  representations  have  limited  use  but  can  be  used  to  investigate  the  covariance 
properties  of  ARIMA  models  and  also  to  express  Yt  in  terms  of  the  white  noise  series 
{et}.  We  defer  the  calculations  until  we  evaluate  specific  cases. 

If  the  process  contains  no  autoregressive  terms,  we  call  it  an  integrated  moving 
average  and  abbreviate  the  name  to  lMA(d,q).  If  no  moving  average  terms  are  present, 
we  denote  the  model  as  ARI(/),(/).  We  first  consider  in  detail  the  important  IMA(1,1) 
model. 

The  IMA(1 ,1)  Model 

The  simple  IMA(1,1)  model  satisfactorily  represents  numerous  time  series,  especially 
those  arising  in  economics  and  business.  In  difference  equation  form,  the  model  is 

Y,=  Y,-l+et-Qet-l  (5-2-5) 

To  write  Yt  explicitly  as  a  function  of  present  and  past  noise  values,  we  use  Equation 
(5.2.3)  and  the  fact  that  Wt  =  et-  Qet_  j  in  this  case.  After  a  little  rearrangement,  we  can 
write 

Yt  =  ef  +  (1-0)ef_1+(l-0)ef_2+"-+(l  — 0)e_m-9e_m_1  (5.2.6) 

Notice  that  in  contrast  to  our  stationary  ARMA  models,  the  weights  on  the  white  noise 
terms  do  not  die  out  as  we  go  into  the  past.  Since  we  are  assuming  that  —m  <  1  and  0  <  f, 
we  may  usefully  think  of  Yt  as  mostly  an  equally  weighted  accumulation  of  a  large  num¬ 
ber  of  white  noise  values. 
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From  Equation  (5.2.6),  we  can  easily  derive  variances  and  correlations.  We  have 


and 


Var(Yt)  =  [1  +02  +  (l -0)2(f  +  m)](72 


CnrrtY  Y  )  =  1  ~  6  +  02  +  ( 1  ~  9)2('  +  m  ~  k) 
P  [Ea/-(Tf)Ear(T;_A.)]1/2 


t  +  m  —  k 
t  +  m 


for  large  m  and  moderate  k 


(5.2.7) 


(5.2.8) 


We  see  that  as  t  increases,  Var(  Y.)  increases  and  could  be  quite  large.  Also,  the  correla¬ 
tion  between  Y,  and  Yt  _  ^  will  be  strongly  positive  for  many  lags  k  =  1,2,  ...  . 


The  IMA(2,2)  Model 


The  assumptions  of  Equation  (5.1.11)  led  to  an  IMA(2,2)  model.  In  difference  equation 
form,  we  have 


V2^ 


et  ®2et-2 


or 

Y,  =  2Y,_l-Yt-2  +  e,-^let_l-^t-2  (5.2.9) 

The  representation  of  Equation  (5.2.4)  may  be  used  to  express  Yt  in  terms  of  et,  et  _  j, _ 

After  some  tedious  algebra,  we  find  that 


Yf  =  et+  ^  \| >jet_j  -  [(1  +  m  +  +  (f  +  m)02]r 


-m  —  1 


7=1 


~(t  +  m  +  l)02e_m_2 


(5.2.10) 


where  \\ij  =  1  +  02  +  (1  -  0i  -  0T)jfor  j  =  1,  2,  3,...,  t  +  m.  Once  more  we  see  that  the 
\|/-weights  do  not  die  out  but  form  a  linear  function  of  j. 

Again,  variances  and  correlations  for  Yt  can  be  obtained  from  the  representation 
given  in  Equation  (5.2.10),  but  the  calculations  are  tedious.  We  shall  simply  note  that  the 
variance  of  Yt  increases  rapidly  with  t  and  again  Corr(  Y ,  Y  k)  is  nearly  1  for  all  mod¬ 
erate  k. 

The  results  of  a  simulation  of  an  IMA(2,2)  process  are  displayed  in  Exhibit  5.5. 
Notice  the  smooth  change  in  the  process  values  (and  the  unimportance  of  the  zero-mean 
function).  The  increasing  variance  and  the  strong,  positive  neighboring  correlations 
dominate  the  appearance  of  the  time  series  plot. 
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Exhibit  5.5  Simulation  of  an  IMA(2,2)  Series  with  0!  =  1  and  02  =  -0.6 


>  data ( ima22 . s ) 

>  plot ( ima22 . s , ylab= ' IMA (2 , 2 )  Simulation type= ' o ' ) 


Exhibit  5.6  shows  the  time  series  plot  of  the  first  difference  of  the  simulated  series. 
This  series  is  also  nonstationary,  as  it  is  governed  by  an  IMA(1,2)  model. 


Exhibit  5.6  First  Difference  of  the  Simulated  IMA(2,2)  Series 


Time 

>  plot (diff ( ima22 . s) , ylab= 1  First  Dif f erence ' , type= ' o ' ) 


Finally,  the  second  differences  of  the  simulated  IMA(2,2)  series  values  are  plotted 
in  Exhibit  5.7.  These  values  arise  from  a  stationary  MA(2)  model  with  0 1  =  I  and  02  = 
-0.6.  From  Equation  (4.2.3)  on  page  63,  the  theoretical  autocorrelations  for  this  model 
are  pi  =  -0.678  and  p2  =  0.254.  These  correlation  values  seem  to  be  reflected  in  the 
appearance  of  the  time  series  plot. 
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Exhibit  5.7  Second  Difference  of  the  Simulated  IMA(2,2)  Series 


U 

0 

O 

c 

0 

0 

it 

Q 


Time 


>  plot (dif f ( ima22 . s , dif f erence=2 ) , ylab= ' Differenced 
Twice ' , type= 'o') 

The  ARI(1 ,1)  Model 

The  ARI(1,1)  process  will  satisfy 

Yt-Yt_  i  =  (K Yt_x-Yt_2)  +  et  (5.2.11) 

or 

Yt  =  (l+4>)rf_j-<|>rf_2  +  e,  (5.2.12) 

where  |(J)|  <  1. ' 

To  find  the  i|/-weights  in  this  case,  we  shall  use  a  technique  that  will  generalize  to 
arbitrary  ARIMA  models.  It  can  be  shown  that  the  \|/-weights  can  be  obtained  by  equat¬ 
ing  like  powers  of  x  in  the  identity: 


(1  -  (|)j X-  ()),,X2  -  ...  -  4 )PxP)(  1  -x)rf(  1  +  \|/jX+  \\>2X2  +  V|/3X3  +  •••) 
=  (1  -  9jX-  02x2  -  03x3  -  •••  -  Qqxq) 

In  our  case,  this  relationship  reduces  to 

(1  -  (|)X)(1  -X)(l  +  XJ/jX  +  \|/2x2  +  \)/3x3  +  •••)  =  1 


(5.2.13) 


or 

[  1  -  (1  +  (|))x  +  (|>X2](1  +  \|/jX  +  \|/2x2  +  \|/3x3  +  •••)  =  1 
Equating  like  powers  of  x  on  both  sides,  we  obtain 


'  Notice  that  this  looks  like  a  special  AR(2)  model.  However,  one  of  the  roots  of  the  corre¬ 
sponding  AR(2)  characteristic  polynomial  is  1,  and  this  is  not  allowed  in  stationary  AR(2) 
models. 
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-(1  +4>)  + Vi  =  0 
c|)-(l  +<l>)i|/1  +V2  =  0 


and,  in  general, 

\\ik  =  (1  +<|>)v|/jt_1-(|>i|4_2  for  k>2  (5.2.14) 

with  \|/0  =  1  and  \|/j  =  1  +  (().  This  recursion  with  starting  values  allows  us  to  compute  as 
many  \|/-weights  as  necessary.  It  can  also  be  shown  that  in  this  case  an  explicit  solution 
to  the  recursion  is  given  as 

1  _  (h*  +  1 

\\i,  =  V  ±  for  k>\  (5.2.15) 

K  1  -  (p 

(It  is  easy,  for  example,  to  show  that  this  expression  satisfies  Equation  (5.2.14). 

5.3  Constant  Terms  in  ARIMA  Models 


For  an  ARIMA(/?,c/,<y)  model,  VclYt  =  Wt  is  a  stationary  AR\\A(p,q)  process.  Our  stan¬ 
dard  assumption  is  that  stationary  models  have  a  zero  mean;  that  is,  we  are  actually 
working  with  deviations  from  the  constant  mean.  A  nonzero  constant  mean,  p,  in  a  sta¬ 
tionary  ARMA  model  {Wt}  can  be  accommodated  in  either  of  two  ways.  We  can 
assume  that 


W,-p  =  <t>1(Wf_1-n)  +  <|)2(W,_2-n)+-+(|)_(Wf__-ji) 


+  et~®\et-\~  ®2et-2  ' 


Qqet-q 


Alternatively,  we  can  introduce  a  constant  term  Bq  into  the  model  as  follows: 


wr  =  eQ  +  ^wt_l+^wt_2+-  +  ^pwt_p 

+  et~Qlet_l~Q2et_2 - Qqet_q 

Taking  expected  values  on  both  sides  of  the  latter  expression,  we  find  that 


p  =  e0 +  (<(>!  +<t>2+  •••  +  V11 

so  that 


1  -  4>1  -  <1,2 - 

or,  conversely,  that 

e0  =  p(  i  -  <!>!  -  <t>2 - <ty) 


(5.3.16) 


(5.3.17) 


Since  the  alternative  representations  are  equivalent,  we  shall  use  whichever  parameter¬ 
ization  is  convenient. 
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What  will  be  the  effect  of  a  nonzero  mean  for  Wt  on  the  undifferenced  series  Ytl 
Consider  the  IMA(1,1)  case  with  a  constant  term.  We  have 

Yt  =  Yt-l  +  ®0  +  et~Qet-l 


or 

Wt  =  Q0  +  et~Qet-l 

Either  by  substituting  into  Equation  (5.2.3)  on  page  93  or  by  iterating  into  the  past,  we 
find  that 

Yt  =  e/  +  (1  l  +(!  -0)ef_2+  +(1  —  0 ) e-m  —  ® m -  1 

(5.3.18) 

+  (r  +  in  +  1 )  0  q 

Comparing  this  with  Equation  (5.2.6),  we  see  that  we  have  an  added  linear  deterministic 
time  trend  ( t  +  m  +  1  )0q  with  slope  90. 

An  equivalent  representation  of  the  process  would  then  be 

Yt  =  *7  +  Po  +  Pif 

where  Ff'  is  an  IMA(1,1)  series  with  Zs(Viy)  =  0and£(VTr)  =  (3j. 

For  a  general  ARIMA(;uC/)  model  where  E(VdYf)  0,  it  can  be  argued  that  Y,  = 
Y'  +  pf ,  where  pf  is  a  deterministic  polynomial  of  degree  d  and  Y'  is  ARIMAtp, <:/,<:/) 
with  EY'  =  0.  With  d  =  2  and  9q  ^  0,  a  quadratic  trend  would  be  implied. 

5.4  Other  Transformations 


We  have  seen  how  differencing  can  be  a  useful  transformation  for  achieving  stationarity. 
However,  the  logarithm  transformation  is  also  a  useful  method  in  certain  circumstances. 
We  frequently  encounter  series  where  increased  dispersion  seems  to  be  associated  with 
higher  levels  of  the  series — the  higher  the  level  of  the  series,  the  more  variation  there  is 
around  that  level  and  conversely. 

Specifically,  suppose  that  Yt>  0  for  all  t  and  that 

E(Yt)  =  p,  and  JVar(Yr)  =  (5.4.1) 

Then 

£[log(Ff)]  w  log(pf)  and  Var(log(Ff))  *  cr2  (5.4.2) 

These  results  follow  from  taking  expected  values  and  variances  of  both  sides  of  the 
(Taylor)  expansion 

log(Ff)»log(pf)  +  ^^ 

In  words,  if  the  standard  deviation  of  the  series  is  proportional  to  the  level  of  the  series, 
then  transforming  to  logarithms  will  produce  a  series  with  approximately  constant  vari¬ 
ance  over  time.  Also,  if  the  level  of  the  series  is  changing  roughly  exponentially,  the 
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log-transformed  series  will  exhibit  a  linear  time  trend.  Thus,  we  might  then  want  to  take 
first  differences.  An  alternative  set  of  assumptions  leading  to  differences  of  logged  data 
follows. 

Percentage  Changes  and  Logarithms 

Suppose  Yt  tends  to  have  relatively  stable  percentage  changes  from  one  time  period  to 
the  next.  Specifically,  assume  that 

Yt  =  d+W-t 

where  1 00 A(  is  the  percentage  change  (possibly  negative)  from  Yt_ j  to  Yr  Then 
log ( Yt)  -  log ( T, _  ! )  =  logf— — ) 

*t- 1 

=  log  ( 1  +Xt) 

If  Xt  is  restricted  to,  say,  \X\  <  0.2  (that  is,  the  percentage  changes  are  at  most  ±20%), 
then,  to  a  good  approximation,  log(  1  +Xt)  ~  Xt.  Consequently, 

V[ log(  Y ()]  ~  Xt  (5.4.3) 

will  be  relatively  stable  and  perhaps  well-modeled  by  a  stationary  process.  Notice  that 
we  take  logs  first  and  then  compute  first  differences — the  order  does  matter.  In  financial 
literature,  the  differences  of  the  (natural)  logarithms  are  usually  called  returns. 

As  an  example,  consider  the  time  series  shown  in  Exhibit  5.8.  This  series  gives  the 
total  monthly  electricity  generated  in  the  United  States  in  millions  of  kilowatt-hours. 
The  higher  values  display  considerably  more  variation  than  the  lower  values. 


Exhibit  5.8  U.S.  Electricity  Generated  by  Month 


Time 

>  data (electricity) ;  plot (electricity) 
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Exhibit  5.9  displays  the  time  series  plot  of  the  logarithms  of  the  electricity  values. 
Notice  how  the  amount  of  variation  around  the  upward  trend  is  now  much  more  uniform 
across  high  and  low  values  of  the  series. 


Exhibit  5.9  Time  Series  Plot  of  Logarithms  of  Electricity  Values 


The  differences  of  the  logarithms  of  the  electricity  values  are  displayed  in  Exhibit 
5.10.  On  the  basis  of  this  plot,  we  might  well  consider  a  stationary  model  as  appropriate. 


Exhibit  5.10  Difference  of  Logarithms  for  Electricity  Time  Series 


>  plot (diff (log (electricity) ) , 

ylab= ' Difference  of  Log (electricity) ') 
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A  flexible  family  of  transformations,  the  power  transformations,  was  introduced  by 
Box  and  Cox  (1964).  For  a  given  value  of  the  parameter  X,  the  transformation  is  defined 
by 


g(x) 


X ^  -  1 

-  X 

logJC 


for  X  0 
for  X  =  0 


(5.4.4) 


The  term  x  is  the  important  part  of  the  first  expression,  but  subtracting  1  and  dividing 
by  X  makes  g(x)  change  smoothly  as  X  approaches  zero.  In  fact,  a  calculus  argument' 
shows  that  as  X  — >  0  ,  (xk  -  l)/X  — » log(x).  Notice  that  X  =  Vi  produces  a  square  root 
transformation  useful  with  Poisson-like  data,  and  X  =  -1  corresponds  to  a  reciprocal 
transformation. 

The  power  transformation  applies  only  to  positive  data  values.  If  some  of  the  values 
are  negative  or  zero,  a  positive  constant  may  be  added  to  all  of  the  values  to  make  them 
all  positive  before  doing  the  power  transformation.  The  shift  is  often  determined  subjec¬ 
tively.  For  example,  for  nonnegative  catch  data  in  biology,  the  occurrence  of  zeros  is 
often  dealt  with  by  adding  a  constant  equal  to  the  smallest  positive  data  value  to  all  of 
the  data  values.  An  alternative  approach  consists  of  using  transformations  applicable  to 
any  data — positive  or  not.  A  drawback  of  this  alternative  approach  is  that  interpretations 
of  such  transformations  are  often  less  straightforward  than  the  interpretations  of  the 
power  transformations.  See  Yeo  and  Johnson  (2000)  and  the  references  contained 
therein. 

We  can  consider  X  as  an  additional  parameter  in  the  model  to  be  estimated  from  the 
observed  data.  However,  precise  estimation  of  X  is  usually  not  warranted.  Evaluation  of 
a  range  of  transformations  based  on  a  grid  of  X  values,  say  ±1,  ±1/2,  ±1/3,  ±1/4,  and  0, 
will  usually  suffice  and  may  have  some  intuitive  meaning. 

Software  allows  us  to  consider  a  range  of  lambda  values  and  calculate  a  log-likeli¬ 
hood  value  for  each  lambda  value  based  on  a  normal  likelihood  function.  A  plot  of  these 
values  is  shown  in  Exhibit  5.1 1  for  the  electricity  data.  The  95%  confidence  interval  for 
X  contains  the  value  of  X  =  0  quite  near  its  center  and  strongly  suggests  a  logarithmic 
transformation  (X  =  0)  for  these  data. 


'  Exercise  (5.17)  asks  you  to  verify  this. 
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Exhibit  5.11  Log-likelihood  versus  Lambda 


o 


X 


>  BoxCox. ar (electricity) 


5.5  Summary 


This  chapter  introduced  the  concept  of  differencing  to  induce  stationarity  on  certain 
nonstationary  processes.  This  led  to  the  important  integrated  autoregressive  moving 
average  models  (ARIMA).  The  properties  of  these  models  were  then  thoroughly 
explored.  Other  transformations,  namely  percentage  changes  and  logarithms,  were  then 
considered.  More  generally,  power  transformations  or  Box-Cox  transformations  were 
introduced  as  useful  transformations  to  stationarity  and  often  normality. 
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Exercises 


5.1  Identify  the  following  as  specific  ARIMA  models.  That  is,  what  are  p ,  d,  and  q 
and  what  are  the  values  of  the  parameters  (the  <|>’s  and  9’s)? 

(a)  Yt=Yt_i~  0.25Tf_2  +  et  -  0.1ef_  j. 

(b)  Yt  =  2Yt  _  i  -  Yt_2  +  et. 

(c)  Yt  =  0.5Yt_l-0.5Yt_2  +  et-0.5et_1+  0.25et_2. 

5.2  For  each  of  the  ARIMA  models  below,  give  the  values  for  E(VYt)  and  Var(VYt). 

(a) Yt=3  +  Y,_1+et-0.75e,_1. 

(b)  Yt  =  10+  l.25Yt_1-0.25Yt_2  +  et-0.let_l. 

(c) Yt  =  5+2Yt_l-1.7Yt_2  +  0JYt_3  +  et-0.5et_1+  0.25et  _  2. 

5.3  Suppose  that  { Yt)  is  generated  according  to  Yt  =  e,  +  cet_  |+  cet~2+  cet-y>l - ^ 

ce o  for  t  >  0. 

(a)  Find  the  mean  and  covariance  functions  for  {Yt}.  Is  {Yt}  stationary? 

(b)  Find  the  mean  and  covariance  functions  for  {VTf}.  Is  ( VK;  j  stationary? 

(c)  Identify  {Yt}  as  a  specific  ARIMA  process. 

5.4  Suppose  that  Yt  =  A  +  Bt  +  Xt,  where  { Xt }  is  a  random  walk.  First  suppose  that  A 
and  B  are  constants. 

(a)  Is  {Yt}  stationary? 

(b)  Is  { VT,}  stationary? 

Now  suppose  that  A  and  B  are  random  variables  that  are  independent  of  the  random 
walk  {Xt}. 

(c)  Is  {Ff}  stationary? 

(d)  Is  { V  }  stationary? 

5.5  Using  the  simulated  white  noise  values  in  Exhibit  5.2,  on  page  88,  verify  the  val¬ 
ues  shown  for  the  explosive  process  Yt. 

5.6  Consider  a  stationary  process  { Yt}.  Show  that  if  p  |  <  Vi,  VYt  has  a  larger  variance 
than  does  Yt. 

5.7  Consider  two  models: 

A:Yt  =  0.9Yt_l  +  0.09Yt_2  +  et. 

B:rf=yf_1  +  e,-0.1ef_l. 

(a)  Identify  each  as  a  specific  ARIMA  model.  That  is,  what  are  p,  d,  and  q  and 
what  are  the  values  of  the  parameters,  <|>’s  and  9’s? 

(b)  In  what  ways  are  the  two  models  different? 

(c)  In  what  ways  are  the  two  models  similar?  (Compare  \|/-weights  and 
Tt-weights.) 
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5.8  Consider  a  nonstationary  “AR(1)”  process  defined  as  a  solution  to  Equation 
(5.1.2)  on  page  88,  with  |(|)|  >  1. 

(a)  Derive  an  equation  similar  to  Equation  (5.1.3)  on  page  88,  for  this  more  gen¬ 
eral  case.  Use  Yq  =  0  as  an  initial  condition. 

(b)  Derive  an  equation  similar  to  Equation  (5.1.4)  on  page  89,  for  this  more  gen¬ 
eral  case. 

(c)  Derive  an  equation  similar  to  Equation  (5.1.5)  on  page  89,  for  this  more  gen¬ 
eral  case. 

(d)  Is  it  true  that  for  any  |((>|  >  1,  Corr(Yf,Yt  Y)  *  1  for  large  t  and  moderate  kl 

5.9  Verify  Equation  (5.1.10)  on  page  90. 

5.10  Nonstationary  ARIMA  series  can  be  simulated  by  first  simulating  the  correspond¬ 
ing  stationary  ARMA  series  and  then  “integrating”  it  (really  partially  summing 
it).  Use  statistical  software  to  simulate  a  variety  of  IMA(1,1)  and  1MA(2,2)  series 
with  a  variety  of  parameter  values.  Note  any  stochastic  “trends”  in  the  simulated 
series. 

5.11  The  data  file  Winnebago  contains  monthly  unit  sales  of  recreational  vehicles 
(RVs)  from  Winnebago,  Inc.,  from  November  1966  through  February  1972. 

(a)  Display  and  interpret  the  time  series  plot  for  these  data. 

(b)  Now  take  natural  logarithms  of  the  monthly  sales  figures  and  display  the  time 
series  plot  of  the  transformed  values.  Describe  the  effect  of  the  logarithms  on 
the  behavior  of  the  series. 

(c)  Calculate  the  fractional  relative  changes,  (Yt  -  Yt_  j )/Yt_  j,  and  compare  them 
with  the  differences  of  (natural)  logarithms, Vlog(Tr)  =  log(Tf)  -  log(Ff  _  j). 
How  do  they  compare  for  smaller  values  and  for  larger  values? 

5.12  The  data  file  SP  contains  quarterly  Standard  &  Poor’s  Composite  Index  stock 
price  values  from  the  first  quarter  of  1936  through  the  fourth  quarter  of  1977. 

(a)  Display  and  interpret  the  time  series  plot  for  these  data. 

(b)  Now  take  natural  logarithms  of  the  quarterly  values  and  display  and  the  time 
series  plot  of  the  transformed  values.  Describe  the  effect  of  the  logarithms  on 
the  behavior  of  the  series. 

(c)  Calculate  the  (fractional)  relative  changes,  ( Yt  -  Yt_{)/Yt_  j,  and  compare 
them  to  the  differences  of  (natural)  logarithms,  Vlog(Tf).  How  do  they  com¬ 
pare  for  smaller  values  and  for  larger  values? 

5.13  The  data  file  airpass  contains  international  airline  passenger  monthly  totals  (in 
thousands)  flown  from  January  1960  through  December  1971.  This  is  a  classic 
time  series  analyzed  in  Box  and  Jenkins  (1976). 

(a)  Display  and  interpret  the  time  series  plot  for  these  data. 

(b)  Now  take  natural  logarithms  of  the  monthly  values  and  display  and  the  time 
series  plot  of  the  transformed  values.  Describe  the  effect  of  the  logarithms  on 
the  behavior  of  the  series. 

(c)  Calculate  the  (fractional)  relative  changes,  ( Y t  -  Yt_{)/Yt_  j,  and  compare 
them  to  the  differences  of  (natural)  logarithms, Vlog(Tf).  How  do  they  com¬ 
pare  for  smaller  values  and  for  larger  values? 
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5.14  Consider  the  annual  rainfall  data  for  Los  Angeles  shown  in  Exhibit  1 . 1,  on  page  2. 
The  quantile-quantile  normal  plot  of  these  data,  shown  in  Exhibit  3.17,  on  page 
50,  convinced  us  that  the  data  were  not  normal.  The  data  are  in  the  file  larain. 

(a)  Use  software  to  produce  a  plot  similar  to  Exhibit  5.11,  on  page  102,  and  deter¬ 
mine  the  “best”  value  of  k  for  a  power  transformation  of  the  data. 

(b)  Display  a  quantile-quantile  plot  of  the  transformed  data.  Are  they  more  nor¬ 
mal? 

(c)  Produce  a  time  series  plot  of  the  transformed  values. 

(d)  Use  the  transformed  values  to  display  a  plot  of  Yt  versus  Yt  _  j  as  in  Exhibit 
1 .2,  on  page  2.  Should  we  expect  the  transformation  to  change  the  dependence 
or  lack  of  dependence  in  the  series? 

5.15  Quarterly  earnings  per  share  for  the  Johnson  &  Johnson  Company  are  given  in  the 
data  file  named  JJ.  The  data  cover  the  years  from  1960  through  1980. 

(a)  Display  a  time  series  plot  of  the  data.  Interpret  the  interesting  features  in  the 
plot. 

(b)  Use  software  to  produce  a  plot  similar  to  Exhibit  5.1 1,  on  page  102,  and  deter¬ 
mine  the  “best”  value  of  k  for  a  power  transformation  of  these  data. 

(c)  Display  a  time  series  plot  of  the  transformed  values.  Does  this  plot  suggest 
that  a  stationary  model  might  be  appropriate? 

(d)  Display  a  time  series  plot  of  the  differences  of  the  transformed  values.  Does 
this  plot  suggest  that  a  stationary  model  might  be  appropriate  for  the  differ¬ 
ences? 

5.16  The  file  named  gold  contains  the  daily  price  of  gold  (in  dollars  per  troy  ounce)  for 
the  252  trading  days  of  year  2005. 

(a)  Display  the  time  series  plot  of  these  data.  Interpret  the  plot. 

(b)  Display  the  time  series  plot  of  the  differences  of  the  logarithms  of  these  data. 
Interpret  this  plot. 

(c)  Calculate  and  display  the  sample  ACF  for  the  differences  of  the  logarithms  of 
these  data  and  argue  that  the  logarithms  appear  to  follow  a  random  walk 
model. 

(d)  Display  the  differences  of  logs  in  a  histogram  and  interpret. 

(e)  Display  the  differences  of  logs  in  a  quantile-quantile  normal  plot  and  inter¬ 
pret. 

5.17  Use  calculus  to  show  that,  for  any  fixed  x  >  0,  as  X  — >  0,  (xx  -  1  )/X  — » log.r . 
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Many  other  books  and  much  of  the  time  series  literature  use  what  is  called  the  backshift 
operator  to  express  and  manipulate  ARIMA  models.  The  backshift  operator,  denoted  B, 
operates  on  the  time  index  of  a  series  and  shifts  time  back  one  time  unit  to  form  a  new 
series/  In  particular, 

BY,  =  Yf  l 

The  backshift  operator  is  linear  since  for  any  constants  a,  b ,  and  c  and  series  Y,  and  Xt,  it 
is  easy  to  see  that 

B(aYf  +  bXf  +  c)  =  ciBYf  +  bBXt  +  c 


Consider  now  the  MA(1)  model.  In  terms  of  B,  we  can  write 
Y,  =  et—Qe,_i  =  et~®Bet  =  (l-0B)e; 

=  m)et 

where  0(B)  is  the  MA  characteristic  polynomial  “evaluated”  at  B. 

Since  BYt  is  itself  a  time  series,  it  is  meaningful  to  consider  I! BY,.  But  clearly  BBYt 
=  BYt_  i  =  Yt_  2,  and  we  can  write 


More  generally,  we  have 

BmY  =  Y 

t  t-m 

for  any  positive  integer  m.  For  a  general  MA(y)  model,  we  can  then  write 


or 


=  et-QlBet-Q2B2et - QqBqet 

=  (1  -0,B-02B2 - ®qBq)et 

Y,  =  0(B)et 


where,  again,  0(B)  is  the  MA  characteristic  polynomial  evaluated  at  B. 

For  autoregressive  models  AR(p),  we  first  move  all  of  the  terms  involving  Y  to  the 
left-hand  side 

VMr-t-^-2 - KYt-P  =  er 

and  then  write 

Yt~  §  [BY,-  <\>2B2Yt - $pBPYt  =  et 


or 


Sometimes  B  is  called  a  Lag  operator. 
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(l-^fi-f,*2 - *pBP)Yt  =  e, 

which  can  be  expressed  as 

HB)Yt  =  et 

where  4>(Z?)  is  the  AR  characteristic  polynomial  evaluated  at  B. 

Combining  the  two,  the  general  ARMA(p,^r)  model  may  be  written  compactly  as 

t?(B)Yt  =  0(5)*, 

Differencing  can  also  be  conveniently  expressed  in  terms  of  B.  We  have 


=  Y,~Y,-1  =  Yt~BYt 

=  (1  -B)Y, 


with  second  differences  given  by 

V2Ff  =  (l -B)*Yt 

Effectively,  V  =  1  -  B  and  V2  =  (1  -  B)2. 

The  general  ARIMA(/x <:/.</)  model  is  expressed  concisely  as 


<|)(5)(1  -B)dYt  =  6  (B)et 


In  the  literature,  one  must  carefully  distinguish  from  the  context  the  use  of  B  as  a 
backshift  operator  and  its  use  as  an  ordinary  real  (or  complex)  variable.  For  example, 
the  stationarity  condition  is  frequently  given  by  stating  that  the  roots  of  <\)(B)  =  0  must  be 
greater  than  1  in  absolute  value  or,  equivalently,  must  lie  outside  the  unit  circle  in  the 
complex  plane.  Here  B  is  to  be  treated  as  a  dummy  variable  in  an  equation  rather  than  as 
the  backshift  operator. 


Chapter  6 


Model  Specification 


We  have  developed  a  large  class  of  parametric  models  for  both  stationary  and  nonsta¬ 
tionary  time  series — the  ARIMA  models.  We  now  begin  our  study  and  implementation 
of  statistical  inference  for  such  models.  The  subjects  of  the  next  three  chapters,  respec¬ 
tively,  are: 

1.  how  to  choose  appropriate  values  for  p,  d,  and  q  for  a  given  series; 

2.  how  to  estimate  the  parameters  of  a  specific  ARIMA(/?,c/,g)  model; 

3.  how  to  check  on  the  appropriateness  of  the  fitted  model  and  improve  it  if  needed. 

Our  overall  strategy  will  first  be  to  decide  on  reasonable — but  tentative — values 
for  p,  d,  and  q.  Having  done  so,  we  shall  estimate  the  <|>’s,  9’s,  and  ae  for  that  model  in 
the  most  efficient  way.  Finally,  we  shall  look  critically  at  the  fitted  model  thus  obtained 
to  check  its  adequacy,  in  much  the  same  way  that  we  did  in  Section  3.6  on  page  42.  If 
the  model  appears  inadequate  in  some  way,  we  consider  the  nature  of  the  inadequacy  to 
help  us  select  another  model.  We  proceed  to  estimate  that  new  model  and  check  it  for 
adequacy. 

With  a  few  iterations  of  this  model-building  strategy,  we  hope  to  arrive  at  the  best 
possible  model  for  a  given  series.  The  book  by  George  E.  R  Box  and  G.  M.  Jenkins 
(1976)  so  popularized  this  technique  that  many  authors  call  the  procedure  the  “Box- 
Jenkins  method.”  We  begin  by  continuing  our  investigation  of  the  properties  of  the  sam¬ 
ple  autocorrelation  function. 

6.1  Properties  of  the  Sample  Autocorrelation  Function 


Recall  from  page  46  the  definition  of  the  sample  or  estimated  autocorrelation  function. 
For  the  observed  series  Y j,  T%...,  Yn,  we  have 

t  (Yt-Y)(Yt_k-Y) 
t  =  k  +  1 

r,  = -  for  A:  =  1,2, ...  (6.1.1) 

i(Y,~  Yf- 

t  =  1 

Our  goal  is  to  recognize,  to  the  extent  possible,  patterns  in  rk  that  are  characteristic 
of  the  known  patterns  in  pk  for  common  ARMA  models.  For  example,  we  know  that 
pk  =  0  for  k>  q  in  an  MMq)  model.  However,  as  the  rk  are  only  estimates  of  the  pk,  we 
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need  to  investigate  their  sampling  properties  to  facilitate  the  comparison  of  estimated 
correlations  with  theoretical  correlations. 

From  the  definition  of  rk,  a  ratio  of  quadratic  functions  of  possibly  dependent  vari¬ 
ables,  it  should  be  apparent  that  the  sampling  properties  of  rk  will  not  be  obtained  easily. 
Even  the  expected  value  of  rk  is  difficult  to  determine — recall  that  the  expected  value  of 
a  ratio  is  not  the  ratio  of  the  respective  expected  values.  We  shall  be  content  to  accept  a 
general  large-sample  result  and  consider  its  implications  in  special  cases.  Bartlett  (1946) 
carried  out  the  original  work.  We  shall  take  a  more  general  result  from  Anderson  ( 1971). 
A  recent  discussion  of  these  results  may  be  found  in  Shumway  and  Stoffer  (2006,  p. 
519). 

We  suppose  that 

00 

Y,  =  v+X  V/t_; 

;'  =  o 

where  the  et  are  independent  and  identically  distributed  with  zero  means  and  finite,  non¬ 
zero,  common  variances.  We  assume  further  that 

OO  00 

^  |\|/  .|  <  oo  and  x  j\\ >j  < 00 

7=0  7=0 

(These  will  be  satisfied  by  any  stationary  ARMA  model.) 

Then,  for  any  fixed  m.  the  joint  distribution  of 

Jn(rl-  pj),  Jn(r2-p2),  ...,  pm) 

approaches,  as  n  — >  oo ,  a  joint  normal  distribution  with  zero  means,  variances  Cjj,  and 
covariances  c,-,-, where 

00 

cij  =  'L(Pk+iPk+j+Pk-iPk+j-2PiPkPk+j-2PjPkPk+i+2PiPjPk)  (6-L2) 

k  =  -oo 


For  large  n,  we  would  say  that  rk  is  approximately  normally  distributed  with  mean  p^. 
and  variance  c^/n.  Furthermore,  Corr(rk,  rj)  ~  ckj/Jckkcjj  .  Notice  that  the  approxi¬ 
mate  variance  of  rk  is  inversely  proportional  to  the  sample  size,  but  Corr(rk,  rj )  is 
approximately  constant  for  large  n. 

Since  Equation  (6.1.2)  is  clearly  difficult  to  interpret  in  its  present  generality,  we 
shall  consider  some  important  special  cases  and  simplifications.  Suppose  first  that  { Yt } 
is  white  noise.  Then  Equation  (6.1.2)  reduces  considerably,  and  we  obtain 


Var(rk) 


1 

n 


and  Corr(rk  ,  rj)  «  0  for  k  j 


(6.1.3) 


Next  suppose  that  { Yt}  is  generated  by  an  AR(1)  process  with  =  <|>*  for  k  >  0. 
Then,  after  considerable  algebra  and  summing  several  geometric  series,  Equation 
(6.1.2)  with  i  =  /'  yields 


~( 1  +  (j)2)(  1  -  c|>2^) 
1  -  (b2 


Var{rk) 


n 


(6.1.4) 
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In  particular. 


Var(rj) 


1  -c|)2 


(6.1.5) 


Notice  that  the  closer  <|>  is  to  ±1,  the  more  precise  our  estimate  of  P]  (=  (|>)  becomes. 

For  large  lags,  the  terms  in  Equation  (6.1.4)  involving  <]/  may  be  ignored,  and  we 

have 


Var(rk) 


1  +  d>21 


for  large  k 


(6.1.6) 


Notice  that  here,  in  contrast  to  Equation  (6.1.5),  values  of  <|>  close  to  ±1  imply  large  vari¬ 
ances  for  r k.  Thus  we  should  not  expect  nearly  as  precise  estimates  of  p/.  =  (A  «  0  for 
large  k  as  we  do  of  =  (j/f  for  small  k. 

For  the  AR(1)  model.  Equation  (6.1.2)  can  also  be  simplified  (after  much  algebra) 
for  general  0  <  i  <  j  as 


c  ••  = 


_  (§}-'- +  +  (|)2) 


(6.1.7) 


In  particular,  we  find 

Corr(r,,  r2)  «  2(j)  / - — - -  (6.1.8) 

12  a/1  +2cj,2-3c|>4 

Based  on  Equations  (6.1.4)  through  (6.1.8),  Exhibit  6.1  gives  approximate  standard 
deviations  and  correlations  for  several  lags  and  a  few  values  of  4)  in  AR(1)  models. 


Exhibit  6.1  Large  Sample  Results  for  Selected  rk from  an  AR(1)  Model 


4> 

JVarir,) 

JVar(r2 ) 

Corr{rv  r2) 

JVa'ir  io) 

±0.9 

0.44  /Jh 

0.807 /Jn 

±0.97 

2.44/Vn 

±0.7 

0.1l/Jn 

1.12 /Jn 

±0.89 

1.70 /Jn 

±0.4 

0.92 /Jn 

1.11  /Jn 

±0.66 

1.18 /Jh 

±0.2 

0.98  A/n 

1.04/Vn 

±0.38 

1.04 /Jn 

For  the  MA(1)  case,  Equation  (6.1.2)  simplifies  as  follows: 


cn  =  1  —  3p2  +  4p^  and  ckk  =  1  +  2p^  for£>l  (6.1.9) 

Furthermore, 

cn  =  2Pl(l-p2)  (6.1.10) 

Based  on  these  expressions,  Exhibit  6.2  lists  large-sample  standard  deviations  and  cor¬ 
relations  for  the  sample  autocorrelations  for  several  lags  and  several  0-values.  Notice 
again  that  the  sample  autocorrelations  can  be  highly  correlated  and  that  the  standard 
deviation  of  rk  is  larger  for  k  >  1  than  for  k  =  1. 
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Exhibit  6.2  Large-Sample  Results  for  Selected  rk from  an  MA(1)  Model 


0 

JVar(, -!) 

JVar(rk)  for  k  >  1 

Corr(rv  r2) 

±0.9 

0.714/n 

1.22 /Jn 

+0.86 

±0.7 

0.73  /Jn 

1.204 /« 

+0.84 

±0.5 

0.19/ffi 

1.154/n 

+0.74 

±0.4 

0.89  /ffn 

1.114 [n 

+0.53 

For  a  general  MA(7/)  process  and  i  =  j  =  k,  Equation  (6.1.2)  reduces  to 


q 

ckk  =  1  +  2  p j  for  k  >  q 

7  =  1 

so  that 


Var(rk) 


1+2 


ip; 


for  k  >  q 


(6.1.11) 


For  an  observed  time  series,  we  can  replace  p’s  by  r’s,  take  the  square  root,  and 
obtain  an  estimated  standard  deviation  of  rk,  that  is,  the  standard  error  of  rk  for  large 
lags.  A  test  of  the  hypothesis  that  the  series  is  MAh/)  could  be  carried  out  by  comparing 
rk  to  plus  and  minus  two  standard  errors.  We  would  reject  the  null  hypothesis  if  and  only 
if  rk  lies  outside  these  bounds.  In  general,  we  should  not  expect  the  sample  autocorrela¬ 
tion  to  mimic  the  true  autocorrelation  in  great  detail.  Thus,  we  should  not  be  surprised  to 
see  ripples  or  “trends”  in  rk  that  have  no  counterparts  in  the  pk. 


6.2  The  Partial  and  Extended  Autocorrelation  Functions 


Since  for  MA(</)  models  the  autocorrelation  function  is  zero  for  lags  beyond  q,  the  sam¬ 
ple  autocorrelation  is  a  good  indicator  of  the  order  of  the  process.  However,  the  autocor¬ 
relations  of  an  AR (p)  model  do  not  become  zero  after  a  certain  number  of  lags — they 
die  off  rather  than  cut  off.  So  a  different  function  is  needed  to  help  determine  the  order 
of  autoregressive  models.  Such  a  function  may  be  defined  as  the  correlation  between  Yt 
and  Yt  _  k  after  removing  the  effect  of  the  intervening  variables  Yt_  j,  Yt_  2,  Yt_  3,..., 
Yt_k+  j .  This  coefficient  is  called  the  partial  autocorrelation  at  lag  k  and  will  be  denoted 
by  §kk.  (The  reason  for  the  seemingly  redundant  double  subscript  on  §kk  will  become 
apparent  later  on  in  this  section.) 

There  are  several  ways  to  make  this  definition  precise.  If  { Yt]  is  a  normally  distrib¬ 
uted  time  series,  we  can  let 

hk  =  Corr(Yt,Yt_k\Yt_l,Yt_2,...,Yt_k+l)  (6.2.1) 

That  is,  §kk  is  the  correlation  in  the  bivariate  distribution  of  Yt  and  Yt  _  k  conditional  on 

Yt-  i>  Yt_2,--.,  Yt_k+  i- 
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An  alternative  approach,  not  based  on  normality,  can  be  developed  in  the  following 
way.  Consider  predicting  Yt  based  on  a  linear  function  of  the  intervening  variables  Y,  ], 
Yt-2,---,  Yt-k+b  saY’  Pi *7-1+  P2T,_2+  •••  +  $k-{Yt_k  +  t,  with  the  P’s  chosen  to 
minimize  the  mean  square  error  of  prediction.  If  we  assume  that  the  P’s  have  been  so 
chosen  and  then  think  backward  in  time,  it  follows  from  stationarity  that  the  best  "pre¬ 
dictor”  of  Yt  _  k  based  on  the  same  Yt_  j,  Yt_  2,...,  Yt  _  k  +1  will  be  Pj  Yt  _  k  +  j  + 
$2Yt-k  +  2+  •••  +  Pfc-  iYt- 1-  The  partial  autocorrelation  function  at  lag  k  is  then 
defined  to  be  the  correlation  between  the  prediction  errors;  that  is, 

hk  =  c0rr(r(-p1rf_1-p2yf_2-...-pft_1r 

(6-2.2) 

Yt-k~^lYt-k+l~^2Yt-k  +  2  $k-  lYt~  l) 


(For  normally  distributed  series,  it  can  be  shown  that  the  two  definitions  coincide.)  By 
convention,  we  take  <)>  1 1  =  1. 

As  an  example,  consider  §22-  It  is  shown  in  Appendix  F  on  page  218  that  the  best 
linear  prediction  of  Yt  based  on  Yt_  |  alone  is  just  P\Yt_  l.  Thus,  according  to  Equation 
(6.2.2),  we  will  obtain  (j)22  by  computing 

Cov(Yt-p1Yt_l,Yt_2-plYt_l)  =  Y0(P2-Pl  “Pi +Pt)  =  Y0(P2-Pl) 

Since 

Var(Yt-plYt_l)  =  Var(Yt_2  -  pj(_ ,) 

=  Y0( 1  +P?  -2pf) 

=  Y0( 1  -  P?) 


we  have  that,  for  any  stationary  process,  the  lag  2  partial  autocorrelation  can  be 
expressed  as 


(6.2.3) 


Consider  now  an  AR(1)  model.  Recall  that  p/.  =  <\)k  so  that 


We  shall  soon  see  that  for  the  AR(1)  case,  (j)^  =  0  for  all  k  >  1.  Thus  the  partial  autocor¬ 
relation  is  nonzero  for  lag  1,  the  order  of  the  AR(1)  process,  but  is  zero  for  all  lags 
greater  than  1.  We  shall  show  this  to  be  generally  the  case  for  AR (p)  models.  Sometimes 
we  say  that  the  partial  autocorrelation  function  for  an  AR(p)  process  cuts  off  after  the 
lag  exceeds  the  order  of  the  process. 

Consider  a  general  AR(p)  case.  It  will  be  shown  in  Chapter  9  that  the  best  linear 
predictor  of  Yt  based  on  a  linear  function  of  the  variables  Yt_  j ,  Yt_2,...,  Yp,...,  Yt_k+  l 
for  k  >  p  is  4>i*j_  j  +  ( \>iYr_T  +  ■■■  +  §pYt_p.  Also,  the  best  linear  predictor  of  Yt_k  is 
some  function  of  Yt_  hYt_2,...,Yp,...,Yt_k  +  h  call  it  h(Yt_l,Yt_2,...,Yp,...,Yt_k+l). 
So  the  covariance  between  the  two  prediction  errors  is 
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Cov(Yt-^lYt_l-^2Yt_2-...-^pYt_p, 

Yt-k-h(Yt_k+1,Yt_k  +  2?. 

=  Cov(et,Yt_k-h(Yt_k+l,  Yt_k  +  2,  ...,  Yf_x)) 

=  0  since  ef  is  independent  of  Yt_k,Ytk+l,Yf_k+2,  ■  ■  ■  >  Yf _  i 
Thus  we  have  established  the  key  fact  that,  for  an  AR (p)  model, 

<t hk  =  0  for  k>p 

For  an  MA(1)  model,  Equation  (6.2.3)  quickly  yields 

i  _  -92 

^22  "  i  +  e2  +  e4 


Furthermore,  for  the  MA(1)  case,  it  may  be  shown  that 


Vkk 


_  9*(1  -92) 

1  _  02(*  +  1) 


for  k>  1 


(6.2.4) 

(6.2.5) 


(6.2.6) 


Notice  that  the  partial  autocorrelation  of  an  MA(1)  model  never  equals  zero  but  essen¬ 
tially  decays  to  zero  exponentially  fast  as  the  lag  increases — rather  like  the  autocorrela¬ 
tion  function  of  the  AR(1)  process.  More  generally,  it  can  be  shown  that  the  partial 
autocorrelation  of  an  MA(g)  model  behaves  very  much  like  the  autocorrelation  of  an 
ARC/)  model. 

A  general  method  for  finding  the  partial  autocorrelation  function  for  any  stationary 
process  with  autocorrelation  function  pk  is  as  follows  (see  Anderson  1971,  pp.  187-188, 
for  example).  For  a  given  lag  k ,  it  can  be  shown  that  the  §kk  satisfy  the  Yule-Walker 
equations  (which  first  appeared  in  Chapter  4  on  page  79): 

P j  =  ^klPj-l  +  h2Pj-2  +  h3Pj-3+---  +hkPj-k  for./  =  1,2,  (6.2.7) 

More  explicitly,  we  can  write  these  k  linear  equations  as 

§kl  + 

Pl$ki  + 

Pk-i^kl  + 


PA-2  + 

P2$k3  +  " 

"  +  Pfc- -  Pi" 

§k2  + 

Pl^3  +  •' 

"  +  P*- 2^** =  P2 

Pk-2^k2  + 

Pk-3$k3  +  " 

' '  +  <t hk =  P  k . 

Flere  we  are  treating  pj,  p2,...,  p/.  as  given  and  wish  to  solve  for  §kl,  §k2,...,  (J)/,/,  (dis¬ 
carding  all  but  §kk). 

These  equations  yield  §kk  for  any  stationary  process.  However,  if  the  process  is  in 
fact  AR(p),  then  since  for  k  =  p  Equations  (6.2.8)  are  just  the  Yule-Walker  equations 
(page  79),  which  the  AR (p)  model  is  known  to  satisfy,  we  must  have  §pp  =  <\)p.  In  addi¬ 
tion,  as  we  have  already  seen  by  an  alternative  derivation,  §kk  =  0  for  k  >  p.  Thus  the  par¬ 
tial  autocorrelation  effectively  displays  the  correct  order  p  of  an  autoregressive  process 
as  the  highest  lag  k  before  §kk  becomes  zero. 


6.2  The  Partial  and  Extended  Autocorrelation  Functions 
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The  Sample  Partial  Autocorrelation  Function 

For  an  observed  time  series,  we  need  to  be  able  to  estimate  the  partial  autocorrelation 
function  at  a  variety  of  lags.  Given  the  relationships  in  Equations  (6.2.8),  an  obvious 
method  is  to  estimate  the  p’s  with  sample  autocorrelations,  the  corresponding  r’s,  and 
then  solve  the  resulting  linear  equations  for  k  =  1,  2,  3,...  to  get  estimates  of  We  call 
the  estimated  function  the  sample  partial  autocorrelation  function  (sample  PACF) 
and  denote  it  by 

Levinson  (1947)  and  Durbin  (1960)  gave  an  efficient  method  for  obtaining  the  solu¬ 
tions  to  Equations  (6.2.8)  for  either  theoretical  or  sample  partial  autocorrelations.  They 
showed  independently  that  Equations  (6.2.8)  can  be  solved  recursively  as  follows: 

k-  1 

Pk~  'Lh-ijPk-j 

hk  =  - ^ -  (6-2-9) 

X  h-ijPj 

./■  =  1 

where 

§k,j  =  $k-lj  ~§kk§k-\,k-j  f°r  J  =  T  2,  1 

For  example,  using  4,ii  =  Pi  to  get  started,  we  have 

,  _  P2~(hlPl  _  P2  ~  Pf 

22  1  —  ‘t’llPl  1-Pf 

(as  before)  with  <j>2 j  =  4*11  -  4>2'> ^ 1 1  *  which  is  needed  for  the  next  step. 

Then 


^  _  P3  <4>2lP2  ^ 22 P 1 

33  l-4>2lPl-(l>22P2 

We  may  thus  calculate  numerically  as  many  values  for  (j)^  as  desired.  As  stated, 
these  recursive  equations  give  us  the  theoretical  partial  autocorrelations,  but  by  replac¬ 
ing  p’s  with  r’s,  we  obtain  the  estimated  or  sample  partial  autocorrelations. 

To  assess  the  possible  magnitude  of  the  sample  partial  autocorrelations,  Quenoulle 
(1949)  has  shown  that,  under  the  hypothesis  that  an  AR (p)  model  is  correct,  the  sample 
partial  autocorrelations  at  lags  greater  than  p  are  approximately  normally  distributed 
with  zero  means  and  variances  1/n.  Thus,  for  k  >  p,  ±2/Jn  can  be  used  as  critical  limits 
on  (j>££  to  test  the  null  hypothesis  that  an  AR(p)  model  is  correct. 

Mixed  Models  and  the  Extended  Autocorrelation  Function 

Exhibit  6.3  summarizes  the  behavior  of  the  autocorrelation  and  partial  autocorrelation 
functions  that  is  useful  in  specifying  models. 
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Exhibit  6.3  General  Behavior  of  the  ACF  and  PACF  for  ARMA  Models 

AR(p)  MA(qf)  ARMA(p,qr),  p>0,  and  qr>0 

ACF  Tails  off  Cuts  off  after  lag  q  Tails  off 

PACF  Cuts  off  after  lag  p  Tails  off  Tails  off 

The  Extended  Autocorrelation  Function 

The  sample  ACF  and  PACF  provide  effective  tools  for  identifying  pure  AR(p)  or  MAC/) 
models.  However,  for  a  mixed  ARMA  model,  its  theoretical  ACF  and  PACF  have  infi¬ 
nitely  many  nonzero  values,  making  it  difficult  to  identify  mixed  models  from  the  sam¬ 
ple  ACF  and  PACF.  Many  graphical  tools  have  been  proposed  to  make  it  easier  to 
identify  the  ARMA  orders,  for  example,  the  corner  method  (Becuin  et  al.,  1980),  the 
extended  autocorrelation  (EACF)  method  (Tsay  and  Tiao,  1984),  and  the  smallest 
canonical  correlation  (SCAN)  method  (Tsay  and  Tiao,  1985),  among  others.  We  shall 
outline  the  EACF  method,  which  seems  to  have  good  sampling  properties  for  moder¬ 
ately  large  sample  sizes  according  to  a  comparative  simulation  study  done  by  W.  S. 
Chan  (1999). 

The  EACF  method  uses  the  fact  that  if  the  AR  part  of  a  mixed  ARMA  model  is 
known,  “filtering  out”  the  autoregression  from  the  observed  time  series  results  in  a  pure 
MA  process  that  enjoys  the  cutoff  property  in  its  ACF.  The  AR  coefficients  may  be  esti¬ 
mated  by  a  finite  sequence  of  regressions.  We  illustrate  the  procedure  for  the  case  where 
the  true  model  is  an  ARM A(  1,1)  model: 

Yt  =  $Yt-i  +  et~Qet-l 

In  this  case,  a  simple  linear  regression  of  Yt  on  Yt  _  j  results  in  an  inconsistent  esti¬ 
mator  of  (|>,  even  with  infinitely  many  data.  Indeed,  the  theoretical  regression  coefficient 
equals  Pi  =  (<j)  —  9)(1  -  4>0)/(  1  -  2<j)0  +  92),  not  4).  But  the  residuals  from  this  regression 
do  contain  information  about  the  error  process  {et} .  A  second  multiple  regression  is  per¬ 
formed  that  consists  of  regressing  Yt  on  Yt_%  and  on  the  lag  1  of  the  residuals  from  the 
first  regression.  The  coefficient  of  Yt_  i  in  the  second  regression,  denoted  by  $,  turns 
out  to  be  a  consistent  estimator  of  4).  Define  Wt  =  Y  -  $  Yf  _  j ,  which  is  then  approxi¬ 
mately  an  MA(1)  process.  For  an  ARMA(1,2)  model,  a  third  regression  that  regresses  Yt 
on  its  lag  1 ,  the  lag  1  of  the  residuals  from  the  second  regression,  and  the  lag  2  of  the 
residuals  from  the  first  regression  leads  to  the  coefficient  of  Yt  _  j  being  a  consistent  esti¬ 
mator  of  4>.  Similarly,  the  AR  coefficients  of  an  ARM  A(/?,c/)  model  can  be  consistently 
estimated  via  a  sequence  of  q  regressions. 

As  the  AR  and  MA  orders  are  unknown,  an  iterative  procedure  is  required.  Let 

Wi,  kj  =  Yt~hY,-l - ~hYt-k  (6-2-10) 

be  the  autoregressive  residuals  defined  with  the  AR  coefficients  estimated  iteratively 
assuming  the  AR  order  is  k  and  the  MA  order  is  j.  The  sample  autocorrelations  of  W t  k  j 
are  referred  to  as  the  extended  sample  autocorrelations.  For  k  =  p  and  j  >  q,  |  W,  /,  j  j  is 
approximately  an  MA(</)  model,  so  that  its  theoretical  autocorrelations  of  lag  q  +  1  or 
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higher  are  equal  to  zero.  For  k  >  p,  an  overfitting  problem  occurs,  and  this  increases  the 
MA  order  for  the  W  process  by  the  minimum  of  k  -  p  and  j  -  q.  Tsay  and  Tiao  (1984) 
suggested  summarizing  the  information  in  the  sample  EACF  by  a  table  with  the  element 
in  the  kth  row  and  y'th  column  equal  to  the  symbol  X  if  the  lag  j  +  1  sample  correlation  of 
Wt  k  j  is  significantly  different  from  0  (that  is,  if  its  magnitude  is  greater  than 
1.96/  Jn  -  j  -  k  since  the  sample  autocorrelation  is  asymptotically  (V(0,l/(n  -  k  —  j ))  if 
the  W’s  are  approximately  an  MA(j)  process)  and  0  otherwise.  In  such  a  table,  an 
MA(p,  q)  process  will  have  a  theoretical  pattern  of  a  triangle  of  zeroes,  with  the  upper 
left-hand  vertex  corresponding  to  the  ARMA  orders.  Exhibit  6.4  displays  the  schematic 
pattern  for  an  ARMA(1,1)  model.  The  upper  left-hand  vertex  of  the  triangle  of  zeros  is 
marked  with  the  symbol  0  and  is  located  in  the  p  =  1  row  and  q  =  1  column — an  indica¬ 
tion  of  an  ARM A(  1,1)  model. 


Exhibit  6.4  Theoretical  Extended  ACF  (EACF)  for  an  ARMA(1,1)  Model 


Of  course,  the  sample  EACF  will  never  be  this  clear-cut.  Displays  like  Exhibit  6.4 
will  contain  8x14  =112  different  estimated  correlations,  and  some  will  be  statistically 
significantly  different  from  zero  by  chance  (see  Exhibit  6.17  on  page  124,  for  an  exam¬ 
ple).  We  will  illustrate  the  use  of  the  EACF  in  the  next  two  sections  and  throughout  the 
remainder  of  the  book. 

6.3  Specification  of  Some  Simulated  Time  Series 


To  illustrate  the  theory  of  Sections  6.1  and  6.2,  we  shall  consider  the  sample  autocorre¬ 
lation  and  sample  partial  correlation  of  some  simulated  time  series. 

Exhibit  6.5  displays  a  graph  of  the  sample  autocorrelation  out  to  lag  20  for  the  sim¬ 
ulated  time  series  that  we  first  saw  in  Exhibit  4.5  on  page  61.  This  series,  of  length  120, 
was  generated  from  an  MA(1)  model  with  9  =  0.9.  From  Exhibit  4.1  on  page  58,  the  the¬ 
oretical  autocorrelation  at  lag  1  is  -0.4972.  The  estimated  or  sample  value  shown  at  lag 
1  on  the  graph  is  -0.474.  Using  Exhibit  6.2  on  page  112,  the  approximate  standard  error 
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of  this  estimate  is  Q.ll/Jn  =  0.7 1/7120  =  0.065,  so  the  estimate  is  well  within  two  stan¬ 
dard  errors  of  the  true  value. 


Exhibit  6.5  Sample  Autocorrelation  of  an  MA(1)  Process  with  0  =  0.9 


>  data (mal . 1 . s) 

>  win . graph (width=4 . 875 , height =3 , pointsize=8 ) 

>  acf (mal . 1 . s , xaxp=c (0 , 20 , 10) ) 


The  dashed  horizontal  lines  in  Exhibit  6.5,  plotted  at  ±2 /Jn  =  ±0.1826,  are 
intended  to  give  critical  values  for  testing  whether  or  not  the  autocorrelation  coefficients 
are  significantly  different  from  zero.  These  limits  are  based  on  the  approximate  large 
sample  standard  error  that  applies  to  a  white  noise  process,  namely  l  A Jn.  Notice  that 
the  sample  ACF  values  exceed  these  rough  critical  values  at  lags  1,  5,  and  14.  Of  course, 
the  true  autocorrelations  at  lags  5  and  14  are  both  zero. 

Exhibit  6.6  displays  the  same  sample  ACF  but  with  critical  bounds  based  on  plus 
and  minus  two  of  the  more  complex  standard  errors  implied  by  Equation  (6.1.1 1)  on 
page  112.  In  using  Equation  (6.1.1 1),  we  replace  p’s  by  r  s,  let  <7  equal  1,2,3,...  succes¬ 
sively,  and  take  the  square  root  to  obtain  these  standard  errors. 
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Exhibit  6.6  Alternative  Bounds  for  the  Sample  ACF  for  the  MA(1) 
Process 


i i  i  i  i  i  i  i  i  r 


2  4  6  8  10  12  14  16  18  20 

Lag 

>  acf (mal . 1 . s , ci . type= 1 raa ' , xaxp=c (0,20,10) ) 

Now  the  sample  ACF  value  at  lag  14  is  insignificant  and  the  one  at  lag  5  is  just 
barely  significant.  The  lag  1  autocorrelation  is  still  highly  significant,  and  the  informa¬ 
tion  given  in  these  two  plots  taken  together  leads  us  to  consider  an  MA(1)  model  for  this 
series.  Remember  that  the  model  is  tentative  at  this  point  and  we  would  certainly  want  to 
consider  other  “nearby”  alternative  models  when  we  carry  out  model  diagnostics. 

As  a  second  example,  Exhibit  6.7  shows  the  sample  ACF  for  the  series  shown  in 
Exhibit  4.2  on  page  59,  generated  by  an  MA(1 )  model  with  9  =  -0.9.  The  critical  values 
based  on  the  very  approximate  standard  errors  point  to  an  M A(  1 )  model  for  this  series 
also. 

Exhibit  6.7  Sample  Autocorrelation  for  an  MA(1)  Process  with  0  =  -0.9 


-st 

O 


c\j 
o 
LL 

O 
< 

o 
o 

C\J 

o 
i 

2  4  6  8  10  12  14  16  18  20 

Lag 

>  data (mal . 2 . s) ;  acf (mal . 2 . s , xaxp=c (0 , 20 , 10) ) 
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For  our  third  example,  we  use  the  data  shown  in  Exhibit  4.8  on  page  63,  which  were 
simulated  from  an  MA(2)  model  with  0  j  =  1  and  02  =  -0.6.  The  sample  ACF  displays 
significance  at  lags  1,  2,  5,  6,  7,  and  14  when  we  use  the  simple  standard  error  bounds. 

Exhibit  6.8  Sample  ACF  for  an  MA(2)  Process  with  0,  =  1  and  02  =  -0.6 


2  4  6  8  10  12  14  16  18  20 

Lag 

>  data(ma2.s);  acf (ma2 . s , xaxp=c ( 0 , 20 , 10 ) ) 

Exhibit  6.9  displays  the  sample  ACF  with  the  more  sophisticated  standard  error 
bounds.  Now  the  lag  2  ACF  is  no  longer  significant,  and  it  appears  that  an  MA(1)  may 
be  applicable.  We  will  have  to  wait  until  we  get  further  along  in  the  model-building  pro¬ 
cess  to  see  that  the  MA(2)  model — the  correct  one — is  the  most  appropriate  model  for 
these  data. 


>  acf (ma2 . s, ci . type= 1 ma ' , xaxp=c (0,20,10) ) 
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How  do  these  techniques  work  for  autoregressive  models?  Exhibit  6.10  gives  the 
sample  ACF  for  the  simulated  AR(1)  process  we  saw  in  Exhibit  4.13  on  page  68.  The 
positive  sample  ACF  values  at  lags  1,  2,  and  3  reflect  the  strength  of  the  lagged  relation¬ 
ships  that  we  saw  earlier  in  Exhibits  4.14,  4.15,  and  4.16.  However,  notice  that  the  sam¬ 
ple  ACF  decreases  more  linearly  than  exponentially  as  theory  suggests.  Also  contrary  to 
theory,  the  sample  ACF  goes  negative  at  lag  10  and  remains  so  for  many  lags. 


Exhibit  6.10  Sample  ACF  for  an  AR(1)  Process  with  <|>  =  0.9 


Lag 


>  data(arl.s);  acf (arl . s , xaxp=c ( 0 , 2 0 , 10 ) ) 


The  sample  partial  autocorrelation  (PACF)  shown  in  Exhibit  6.11,  gives  a  much 
clearer  picture  about  the  nature  of  the  generating  model.  Based  on  this  graph,  we  would 
certainly  entertain  an  AR(1)  model  for  this  time  series. 


Exhibit  6.11  Sample  Partial  ACF  for  an  AR(1 )  Process  with  ()>  =  0.9 


Lag 
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>  pacf (arl . s , xaxp=c ( 0 , 2  0 , 10)  ) 


Exhibit  6.12  displays  the  sample  ACF  for  our  AR(2)  time  series.  The  time  series 
plot  for  this  series  was  shown  in  Exhibit  4.19  on  page  74.  The  sample  ACF  does  look 
somewhat  like  the  damped  wave  that  Equation  (4.3.17)  on  page  73,  and  Exhibit  4.18 
suggest.  However,  the  sample  ACF  does  not  damp  down  nearly  as  quickly  as  theory  pre¬ 
dicts. 


Exhibit  6.12  Sample  ACF  for  an  AR(2)  Process  with  (^  =  1.5  and  (j>2  =  -0.75 


>  acf (ar2 . s , xaxp=c ( 0 , 2 0 , 10 ) ) 


The  sample  PACF  in  Exhibit  6.13  gives  a  strong  indication  that  we  should  consider 
an  AR(2)  model  for  these  data.  The  seemingly  significant  sample  PACF  at  lag  9  would 
need  to  be  investigated  further  during  model  diagnostics. 


Exhibit  6.13  Sample  PACF  for  an  AR(2)  Process  with  4*!  =  1.5  and 
<|>2  =  -0.75 
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>  pacf (ar2 . s , xaxp=c (0 , 20 , 10)  ) 

As  a  final  example,  we  simulated  100  values  of  a  mixed  ARM A(  1,1)  model  with  4> 
=  0.6  and  0  =  -0.3.  The  time  series  plot  is  shown  in  Exhibit  6.14  and  the  sample  ACF 
and  PACFs  are  shown  in  Exhibit  6.15  and  Exhibit  6.16,  respectively.  These  seem  to  indi¬ 
cate  that  an  AR(1)  model  should  be  specified. 


>  acf (armall . s , xaxp=c ( 0 , 2 0 , 10 )  ) 


Lag 
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Exhibit  6.1 6  Sample  PACF  for  Simulated  ARMA(1 ,1 )  Series 
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>  pacf (armall . s , xaxp=c ( 0 , 2 0 , 10 )  ) 

However,  the  triangular  region  of  zeros  shown  in  the  sample  EACF  in  Exhibit  6.17 
indicates  quite  clearly  that  a  mixed  model  with  q  =  1  and  with  p  =  1  or  2  would  be  more 
appropriate.  We  will  illustrate  further  uses  of  the  EACF  when  we  specify  some  real 
series  in  Section  6.6. 

Exhibit  6.17  Sample  EACF  for  Simulated  ARMA(1,1)  Series 

AR/MA  0  1  2  3  4  5  6  7  8  9  10  11  12  13 

0  xxxxoooooooooo 

1  xooooooooooooo 

2  xooooooooooooo 

3  xxoooooooooooo 

4  xoxooooooooooo 

5  xooooooooooooo 

6  xoooxooooooooo 

7  xoooxooooooooo 

>  eacf (armall . s) 
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6.4  Nonstationarity 


As  indicated  in  Chapter  5,  many  series  exhibit  nonstationarity  that  can  be  explained  by 
integrated  ARM  A  models.  The  nonstationarity  will  frequently  be  apparent  in  the  time 
series  plot  of  the  series.  A  review  of  Exhibits  5.1,  5.5,  and  5.8  is  recommended  here. 

The  sample  ACF  computed  for  nonstationary  series  will  also  usually  indicate  the 
nonstationarity.  The  definition  of  the  sample  autocorrelation  function  implicitly 
assumes  stationarity;  for  example,  we  use  lagged  products  of  deviations  from  the  overall 
mean,  and  the  denominator  assumes  a  constant  variance  over  time.  Thus  it  is  not  at  all 
clear  what  the  sample  ACF  is  estimating  for  a  nonstationary  process.  Nevertheless,  for 
nonstationary  series,  the  sample  ACF  typically  fails  to  die  out  rapidly  as  the  lags 
increase.  This  is  due  to  the  tendency  for  nonstationary  series  to  drift  slowly,  either  up  or 
down,  with  apparent  “trends.”  The  values  of  rk  need  not  be  large  even  for  low  lags,  but 
often  they  are. 

Consider  the  oil  price  time  series  shown  in  Exhibit  5.1  on  page  88.  The  sample  ACF 
for  the  logarithms  of  these  data  is  displayed  in  Exhibit  6.18.  All  values  shown  are  “sig¬ 
nificantly  far  from  zero,”  and  the  only  pattern  is  perhaps  a  linear  decrease  with  increas¬ 
ing  lag.  The  sample  PACF  (not  shown)  is  also  indeterminate. 


Exhibit  6.18  Sample  ACF  for  the  Oil  Price  Time  Series 


Lag 

>  data (oil .price) 

>  acf (as .vector (oil .price) , xaxp=c (0,24,12) ) 


The  sample  ACF  computed  on  the  first  differences  of  the  logs  of  the  oil  price  series 
is  shown  in  Exhibit  6.19.  Now  the  pattern  emerges  much  more  clearly — after  differenc¬ 
ing,  a  moving  average  model  of  order  1  seems  appropriate.  The  model  for  the  original 
oil  price  series  would  then  be  a  nonstationary  IMA(l.l)  model.  (The  “significant”  ACF 
at  lags  15,  16,  and  20  are  ignored  for  now.) 
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Exhibit  6.19  Sample  ACF  for  the  Difference  of  the  Log  Oil  Price  Series 


>  acf (dif f (as .vector (log (oil .price) ) ) , xaxp=c (0,24,12)) 


If  the  first  difference  of  a  series  and  its  sample  ACF  do  not  appear  to  support  a  sta¬ 
tionary  ARMA  model,  then  we  take  another  difference  and  again  compute  the  sample 
ACF  and  PACF  to  look  for  characteristics  of  a  stationary  ARMA  process.  Usually  one 
or  at  most  two  differences,  perhaps  combined  with  a  logarithm  or  other  transformation, 
will  accomplish  this  reduction  to  stationarity.  Additional  properties  of  the  sample  ACF 
computed  on  nonstationary  data  are  given  in  Wichern  (1973),  Roy  (1977),  and  Flasza 
(1980).  See  also  Box,  Jenkins,  and  Reinsel  (1994,  p.  218). 


Overdifferencing 

From  Exercise  2.6  on  page  20,  we  know  that  the  difference  of  any  stationary  time  series 
is  also  stationary.  However,  overdifferencing  introduces  unnecessary  correlations  into  a 
series  and  will  complicate  the  modeling  process. 

For  example,  suppose  our  observed  series,  { Yt},  is  in  fact  a  random  walk  so  that  one 
difference  would  lead  to  a  very  simple  white  noise  model 


However,  if  we  difference  once  more  (that  is,  overdifference)  we  have 


which  is  an  MA(1)  model  but  with  9  =  1.  If  we  take  two  differences  in  this  situation  we 
unnecessarily  have  to  estimate  the  unknown  value  of  0.  Specifying  an  IMA(2,1)  model 
would  not  be  appropriate  here.  The  random  walk  model,  which  can  be  thought  of  as 
IMA(1,1)  with  9  =  0,  is  the  correct  model.'  Overdifferencing  also  creates  a  noninvert - 


'  The  random  walk  model  can  also  be  thought  of  as  an  ARI(1,1)  with  (])  =  0  or  as  a  nonsta¬ 
tionary  AR(  1)  with  <j)  =  1 . 
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ible  model — see  Section  4.5  on  page  19.'  Noninvertible  models  also  create  serious 
problems  when  we  attempt  to  estimate  their  parameters — see  Chapter  7. 

To  illustrate  overdifferencing,  consider  the  random  walk  shown  in  Exhibit  2.1  on 
page  14.  Taking  one  difference  should  lead  to  white  noise — a  very  simple  model.  If  we 
mistakenly  take  two  differences  (that  is,  overdifference)  and  compute  the  sample  ACF, 
we  obtain  the  graph  shown  in  Exhibit  6.20.  Based  on  this  plot,  we  would  likely  specify 
at  least  an  IMA(2,1)  model  for  the  original  series  and  then  estimate  the  unnecessary  MA 
parameter.  We  also  have  a  significant  sample  ACF  value  at  lag  7  to  think  about  and  deal 
with. 


Exhibit  6.20  Sample  ACF  of  Overdifferenced  Random  Walk 


2  4  6  8  10  12  14  16 


Lag 

>  data(rwalk) 

>  acf (dif f (rwalk, dif ference=2 ) , ci . type= 1 raa ' ,  xaxp=c (0 , 18 , 9) ) 


In  contrast.  Exhibit  6.21  displays  the  sample  ACF  of  th e  first  difference  of  the  ran¬ 
dom  walk  series.  Viewing  this  graph,  we  would  likely  want  to  consider  the  correct 
model — the  first  difference  looks  very  much  like  white  noise. 


'  In  backshift  notation,  if  the  correct  model  is  4>( 7? )( 1  —B)Y  =  0(fi)ef ,  overdifferencing 
leads  to  <|>(fi)(  1  - 5)2Tf  =  Q(B)(1-B)e  =  0'(fi)e  say,  where  9'(B)  =  (1-5)0(13) 
and  the  “forbidden”  root  in  0'(6)  at  B  =  1  is  obvious. 
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Exhibit  6.21  Sample  ACF  of  Correctly  Differenced  Random  Walk 


2  4  6  8  10  12  14  16 


Lag 

>  acf (dif f (rwalk) , ci . type= ' raa ' , xaxp=c (0,18,9)) 

To  avoid  overdifferencing,  we  recommend  looking  carefully  at  each  difference  in 
succession  and  keeping  the  principle  of  parsimony  always  in  mind — models  should  be 
simple,  but  not  too  simple. 

The  Dickey-Fuller  Unit-Root  Test 

While  the  approximate  linear  decay  of  the  sample  ACF  is  often  taken  as  a  symptom  that 
the  underlying  time  series  is  nonstationary  and  requires  differencing,  it  is  also  useful  to 
quantify  the  evidence  of  nonstationarity  in  the  data-generating  mechanism.  This  can  be 
done  via  hypothesis  testing.  Consider  the  model 

Y f  =  a  Yf  _  j  +  Xf  for  t  =  1 ,  2,  . . . 

where  {  X,  }  is  a  stationary  process.  The  process  |  Y,  }  is  nonstationary  if  the  coefficient  a 
=  1,  but  it  is  stationary  if  |a|  <  1.  Suppose  that  {Xt}  is  an  AR (k)  process:  Xt=  4>1^r _  j  + 
•  •  •  +  fyjXi  _  ^  +  et.  Under  the  null  hypothesis  that  a  =  1 ,  Xt  =  Yt  -  Yt  _  j.  Letting  a  =  a  - 
1 ,  we  have 

Yt~Y,_  j  =  (a-1  )Yt_l+Xt 

=  aYt_  i+ §\Xt_l+  ■■■  +  §kXt_k  + et  (6.4.1) 

=  aY,-\+^\(Yt-\-Yt-2)+  ■■■  +  §t<(Yt-k-Yt-k-\')  +  et 

where  a  =  0  under  the  hypothesis  that  Yt  is  difference  nonstationary.  On  the  other  hand, 
if  {Yt}  is  stationary  so  that  -1  <  a  <  1,  then  it  can  be  verified  that  Yt  still  satisfies  an 
equation  similar  to  the  equation  above  but  with  different  coefficients;  for  example,  a  = 
(1  —  ())1 - <K:)(1  -  a)  <  0.  Indeed,  {Yt}  is  then  an  AR(£  +  1)  process  whose  AR  char¬ 
acteristic  equation  is  given  by  <t(.r)(l  -  ax)  =  0,  where  4>(.r)  =  1  -  <))  |  .t - (Ji^.  So,  the 

null  hypothesis  corresponds  to  the  case  where  the  AR  characteristic  polynomial  has  a 
unit  root  and  the  alternative  hypothesis  states  that  it  has  no  unit  roots.  Consequently,  the 
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test  for  differencing  amounts  to  testing  for  a  unit  root  in  the  AR  characteristic  polyno¬ 
mial  of  { Yt } . 

By  the  analysis  above,  the  null  hypothesis  that  a  =  1  (equivalently  a  =  0)  can  be 
tested  by  regressing  the  first  difference  of  the  observed  time  series  on  lag  1  of  the 
observed  series  and  on  the  past  k  lags  of  the  first  difference  of  the  observed  series.  We 
then  test  whether  the  coefficient  a  =  0 — the  null  hypothesis  being  that  the  process  is  dif¬ 
ference  nonstationary.  That  is,  the  process  is  nonstationary  but  becomes  stationary  after 
first  differencing.  The  alternative  hypothesis  is  that  a  <  0  and  hence  { Yt}  is  stationary. 
The  augmented  Dickey-Fuller  (ADF)  test  statistic  is  the  /-statistic  of  the  estimated  coef¬ 
ficient  of  a  from  the  method  of  least  squares  regression.  However,  the  ADF  test  statistic 
is  not  approximately  /-distributed  under  the  null  hypothesis;  instead,  it  has  a  certain  non¬ 
standard  large-sample  distribution  under  the  null  hypothesis  of  a  unit  root.  Fortunately, 
percentage  points  of  this  limit  (null)  distribution  have  been  tabulated;  see  Fuller  (1996). 

In  practice,  even  after  first  differencing,  the  process  may  not  be  a  finite-order  AR 
process,  but  it  may  be  closely  approximated  by  some  AR  process  with  the  AR  order 
increasing  with  the  sample  size.  Said  and  Dickey  (1984)  (see  also  Chang  and  Park, 
2002)  showed  that  with  the  AR  order  increasing  with  the  sample  size,  the  ADF  test  has 
the  same  large-sample  null  distribution  as  the  case  where  the  first  difference  of  the  time 
series  is  a  finite-order  AR  process.  Often,  the  approximating  AR  order  can  be  first  esti¬ 
mated  based  on  some  information  criteria  (for  example,  AIC  or  BIC)  before  carrying 
out  the  ADF  test.  See  Section  6.5  on  page  130  for  more  information  on  the  AIC  and  BIC 
criteria. 

In  some  cases,  the  process  may  be  trend  nonstationary  in  the  sense  that  it  has  a 
deterministic  trend  (for  example,  some  linear  trend)  but  otherwise  is  stationary.  A 
unit-root  test  may  be  conducted  with  the  aim  of  discerning  difference  stationarity  from 
trend  stationarity.  This  can  be  done  by  carrying  out  the  ADF  test  with  the  detrended 
data.  Equivalently,  this  can  be  implemented  by  regressing  the  first  difference  on  the 
covariates  defining  the  trend,  the  lag  1  of  the  original  data,  and  the  past  lags  of  the  first 
difference  of  the  original  data.  The  /-statistic  based  on  the  coefficient  estimate  of  the  lag 
1  of  the  original  data  furnishes  the  ADF  test  statistic,  which  has  another  nonstandard 
large-sample  null  distribution.  See  Phillips  and  Xiao  (1998)  for  a  survey  of  unit  root 
testing. 

We  now  illustrate  the  ADF  test  with  the  simulated  random  walk  shown  in  Exhibit 
2.1  on  page  14.  First,  we  consider  testing  the  null  hypothesis  of  a  unit  root  versus  the 
alternative  hypothesis  that  the  time  series  is  stationary  with  unknown  mean.  Hence,  the 
regression  defined  by  Equation  (6.4.1)  is  augmented  with  an  intercept  to  allow  for  the 
possibly  nonzero  mean  under  the  alternative  hypothesis.  (For  the  alternative  hypothesis 
that  the  process  is  a  stationary  process  of  zero  mean,  the  ADF  test  statistic  can  be 
obtained  by  running  the  unaugmented  regression  defined  by  Equation  (6.4.1).)  To  carry 
out  the  test,  we  must  determine  k. 1  Using  the  AIC  with  the  first  difference  of  the  data, 
we  find  that  k  =  8,  in  which  case  the  ADF  test  statistic  becomes  -0.601,  with  the  /i- value 


^  R  code:  ar  (dif  f  (rwalk)  ) 
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being  greater  than  0.1.^  On  the  other  hand,  setting  k  =  0  (the  true  order)  leads  to  the 
ADF  statistic  -1.738,  with  /7-value  still  greater  than  0. 1 i  Thus,  there  is  strong  evidence 
supporting  the  unit-root  hypothesis.  Second,  recall  that  the  simulated  random  walk 
appears  to  have  a  linear  trend.  Hence,  linear  trend  plus  stationary  error  forms  another 
reasonable  alternative  to  the  null  hypothesis  of  unit  root  (difference  nonstationarity).  For 
this  test,  we  include  both  an  intercept  term  and  the  covariate  time  in  the  regression 
defined  by  Equation  (6.4.1).  With  k  =  8,  the  ADF  test  statistic  equals  -2.289  with 
/7-value  greater  than  0.1  ,  that  is,  we  do  not  reject  the  null  hypothesis  of  unit  root.  On 
the  other  hand,  setting  k  =  0,  the  true  order  that  is  unknown  in  practice,  the  ADF  test  sta¬ 
tistic  becomes  -3.49  with /7-value  equal  to  0.050 1 Hence,  there  is  weak  evidence  that 
the  process  is  linear-trend  nonstationary;  that  is,  the  process  equals  linear  time  trend 
plus  stationary  error,  contrary  to  the  truth  that  the  process  is  a  random  walk,  being  dif¬ 
ference  nonstationary!  This  example  shows  that  with  a  small  sample  size,  it  may  be  hard 
to  differentiate  between  trend  nonstationarity  and  difference  nonstationarity. 

6.5  Other  Specification  Methods 


A  number  of  other  approaches  to  model  specification  have  been  proposed  since  Box  and 
Jenkins’  seminal  work.  One  of  the  most  studied  is  Akaike’s  (1973)  Information  Crite¬ 
rion  (AIC).  This  criterion  says  to  select  the  model  that  minimizes 


AIC  =  -  21og(maximum  likelihood)  +  2k  (6.5.1) 

where  k  =  p  +  q  +  1  if  the  model  contains  an  intercept  or  constant  term  and  k  =  p  +  q  oth¬ 
erwise.  Maximum  likelihood  estimation  is  discussed  in  Chapter  7.  The  addition  of  the 
term  2(p  +  q  +1)  or  2(p  +  q)  serves  as  a  “penalty  function”  to  help  ensure  selection  of 
parsimonious  models  and  to  avoid  choosing  models  with  too  many  parameters. 

The  AIC  is  an  estimator  of  the  average  Kullback-Leibler  divergence  of  the  esti¬ 
mated  model  from  the  true  model.  Let  p(yi,y2,---,yn)  be  the  true  pdf  of  Yj,  Y2,  ...,  Yn, 
and  qQ(y  |  ,y2, . . .  be  the  corresponding  pdf  under  the  model  with  parameter  9.  The 
Kullback-Leibler  divergence  of  qB  from  p  is  defined  by  the  formula 


D(p,qB)  =  f  f  ...}  p(yvy2,  ...,yn)log 

—OO  —00  —GO 


~p{yvy2, 

.?0(vi,v2,  ...,yn). 


dy  ]dy2...  dyn 


The  AIC  estimates  E [D(p,  </-)] ,  where  9  is  the  maximum  likelihood  estimator  of  the 
vector  parameter  9.  However,  the  AIC  is  a  biased  estimator,  and  the  bias  can  be  appre¬ 
ciable  for  large  parameter  per  data  ratios.  Hurvich  and  Tsai  (1989)  showed  that  the  bias 
can  be  approximately  eliminated  by  adding  another  nonstochastic  penalty  term  to  the 
AIC,  resulting  in  the  corrected  AIC,  denoted  by  AICc  and  defined  by  the  formula 


R code: library (uroot )  ;  ADF . test (rwalk, selectlags  =  list 
(mode  =  c (1,2,3,4,5,6,7,81  , Pmax=8) , itsd=c (1,0,0)  ) 

*  ADF . test (rwalk, selectlags=list (Pmax=0) , itsd=c (1,0,0) ) 
ADF . test (rwalk, selectlags=list 

(mode=c (1,2, 3, 4, 5, 6, 7, 8) , Pmax=8) , itsd=c (1,1,0) ) 

M  ADF. test (rwalk, selectlags=list (Pmax=0) , itsd=c (1,1,0)) 
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AICc  =  AIC  +  2(A'+  1)(Ar  +  2)  (6.5.2) 

n  —  k  —  2 

Here  n  is  the  (effective)  sample  size  and  again  k  is  the  total  number  of  parameters  as 
above  excluding  the  noise  variance.  Simulation  results  by  Hurvich  and  Tsai  (1989)  sug¬ 
gest  that  for  cases  with  kin  greater  than  10%,  the  AICc  outperforms  many  other  model 
selection  criteria,  including  both  the  AIC  and  BIC. 

Another  approach  to  determining  the  ARMA  orders  is  to  select  a  model  that  mini¬ 
mizes  the  Schwarz  Bayesian  Information  Criterion  (BIC)  defined  as 

BIC  =  -  21og(maximum  likelihood)  +  klog(n)  (6.5.3) 

If  the  true  process  follows  an  ARMA(p,q)  model,  then  it  is  known  that  the  orders  speci¬ 
fied  by  minimizing  the  BIC  are  consistent;  that  is,  they  approach  the  true  orders  as  the 
sample  size  increases.  However,  if  the  true  process  is  not  a  finite-order  ARMA  process, 
then  minimizing  AIC  among  an  increasingly  large  class  of  ARMA  models  enjoys  the 
appealing  property  that  it  will  lead  to  an  optimal  ARMA  model  that  is  closest  to  the  true 
process  among  the  class  of  models  under  study. ' 

Regardless  of  whether  we  use  the  AIC  or  BIC,  the  methods  require  carrying  out 
maximum  likelihood  estimation.  However,  maximum  likelihood  estimation  for  an 
ARMA  model  is  prone  to  numerical  problems  due  to  multimodality  of  the  likelihood 
function  and  the  problem  of  overfitting  when  the  AR  and  MA  orders  exceed  the  true 
orders.  Hannan  and  Rissanen  (1982)  proposed  an  interesting  and  practical  solution  to 
this  problem.  Their  procedure  consists  of  first  fitting  a  high-order  AR  process  with  the 
order  determined  by  minimizing  the  AIC.  The  second  step  uses  the  residuals  from  the 
first  step  as  proxies  for  the  unobservable  error  terms.  Thus,  an  ARMA(kj)  model  can  be 
approximately  estimated  by  regressing  the  time  series  on  its  own  lags  1  to  k  together 
with  the  lags  1  to  j  of  the  residuals  from  the  high  order  autoregression;  the  BIC  of  this 
autoregressive  model  is  an  estimate  of  the  BIC  obtained  with  maximum  likelihood  esti¬ 
mation.  Hannan  and  Rissanen  (1982)  demonstrated  that  minimizing  the  approximate 
BIC  still  leads  to  consistent  estimation  of  the  ARMA  orders. 

Order  determination  is  related  to  the  problem  of  finding  the  subset  of  nonzero  coef¬ 
ficients  of  an  ARMA  model  with  sufficiently  high  ARMA  orders.  A  subset  ARMA(p.</) 
model  is  an  ARMA(/i. q)  model  with  a  subset  of  its  coefficients  known  to  be  zero.  For 
example,  the  model 

Yt  =  0.8Tf_i2  +  et+  0.1et_i2  (6.5.4) 

is  a  subset  ARMA(12,12)  model  useful  for  modeling  some  monthly  seasonal  time 
series.  For  ARMA  models  of  very  high  orders,  such  as  the  preceding  ARMA(12,12) 
model,  finding  a  subset  ARMA  model  that  adequately  approximates  the  underlying  pro¬ 
cess  is  more  important  from  a  practical  standpoint  than  simply  determining  the  ARMA 
orders.  The  method  of  Hannan  and  Rissanen  (1982)  for  estimating  the  ARMA  orders 
can  be  extended  to  solving  the  problem  of  finding  an  optimal  subset  ARMA  model. 


1  Closeness  is  measured  in  terms  of  the  Kullback-Leibler  divergence — a  measure  of  dispar¬ 
ity  between  models.  See  Shibata  (1976)  and  the  discussion  in  Stenseth  et  al.  (2004). 
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Indeed,  several  model  selection  criteria  (including  AIC  and  BIC)  of  the  subset 
AR\1A(/).c/)  models  (21’  1  9 of  them!)  can  be  approximately,  exhaustively,  and  quickly 
computed  by  the  method  of  regression  by  leaps  and  bounds  (Furnival  and  Wilson,  1974) 
applied  to  the  subset  regression  of  Y,  on  its  own  lags  and  on  lags  of  the  residuals  from  a 
high-order  autoregression  of  {  Yt } . 

It  is  prudent  to  examine  a  few  best  subset  ARMA  models  (in  terms  of,  for  example, 
BIC)  in  order  to  arrive  at  some  helpful  tentative  models  for  further  study.  The  pattern  of 
which  lags  of  the  observed  time  series  and  which  of  the  error  process  enter  into  the  var¬ 
ious  best  subset  models  can  be  summarized  succinctly  in  a  display  like  that  shown  in 
Exhibit  6.22.  This  table  is  based  on  a  simulation  of  the  ARMA(12,12)  model  shown  in 
Equation  (6.5.4).  Each  row  in  the  exhibit  corresponds  to  a  subset  ARMA  model  where 
the  cells  of  the  variables  selected  for  the  model  are  shaded.  The  models  are  sorted 
according  to  their  BIC,  with  better  models  (lower  BIC)  placed  in  higher  rows  and  with 
darker  shades.  The  top  row  tells  us  that  the  subset  ARMA(14,14)  model  with  the  small¬ 
est  BIC  contains  only  lags  8  and  12  of  the  observed  time  series  and  lag  12  of  the  error 
process.  The  next  best  model  contains  lag  12  of  the  time  series  and  lag  8  of  the  errors, 
while  the  third  best  model  contains  lags  4,  8,  and  12  of  the  time  series  and  lag  12  of  the 
errors.  In  our  simulated  time  series,  the  second  best  model  is  the  true  subset  model. 
However,  the  BIC  values  for  these  three  models  are  all  very  similar,  and  all  three  (plus 
the  fourth  best  model)  are  worthy  of  further  study.  However,  lag  12  of  the  time  series 
and  that  of  the  errors  are  the  two  variables  most  frequently  found  in  the  various  subset 
models  summarized  in  the  exhibit,  suggesting  that  perhaps  they  are  the  more  important 
variables,  as  we  know  they  are! 


Exhibit  6.22  Best  Subset  ARMA  Selection  Based  on  BIC 
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>  set . seed (92397) 

>  test=arima . sim (model=list (ar=c (rep (0,11) , . 8 ) , 

ma=c (rep (0,11) ,0.7) ) ,n=120) 

>  res=armasubsets (y=test , nar=14 , nma=14 , y . name= ' test ' , 

ar . method= ' ols 1 ) 

>  plot (res) 

6.6  Specification  of  Some  Actual  Time  Series 

Consider  now  specification  of  models  for  some  of  the  actual  time  series  that  we  saw  in 
earlier  chapters. 

The  Los  Angeles  Annual  Rainfall  Series 

Annual  total  rainfall  amounts  for  Los  Angeles  were  shown  in  Exhibit  1.1  on  page  2.  In 
Chapter  3,  we  noted  in  Exhibit  3.17  on  page  50,  that  rainfall  amounts  were  not  normally 
distributed.  As  is  shown  in  Exhibit  6.23,  taking  logarithms  improves  the  normality  dra¬ 
matically. 

Exhibit  6.23  QQ  Normal  Plot  of  the  Logarithms  of  LA  Annual  Rainfall 


-2-10  1  2 


Theoretical  Quantiles 

>  data ( larain) ;  win . graph (width=2 . 5 , height=2 . 5 , pointsize=8 ) 

>  qqnorm ( log (larain) ) ;  qqline ( log ( larain) ) 
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Exhibit  6.24  displays  the  sample  autocorrelations  for  the  logarithms  of  the  annual 
rainfall  series. 


Exhibit  6.24  Sample  ACF  of  the  Logarithms  of  LA  Annual  Rainfall 


>  win . graph (width=4 . 875 , height =3 , pointsize=8 ) 

>  acf (log (larain) , xaxp=c (0,20,10)  ) 


The  log  transformation  has  improved  the  normality,  but  there  is  no  discernable 
dependence  in  this  time  series.  We  could  model  the  logarithm  of  annual  rainfall  amount 
as  independent,  normal  random  variables  with  mean  2.58  and  standard  deviation  0.478. 
Both  these  values  are  in  units  of  log(inches). 

The  Chemical  Process  Color  Property  Series 

The  industrial  chemical  process  color  property  displayed  in  Exhibit  1.3  on  page  3, 
shows  more  promise  of  interesting  time  series  modeling — especially  in  light  of  the 
dependence  of  successive  batches  shown  in  Exhibit  1.4  on  page  4.  The  sample  ACF 
plotted  in  Exhibit  6.25  might  at  first  glance  suggest  an  MA(1)  model,  as  only  the  lag  1 
autocorrelation  is  significantly  different  from  zero. 
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However,  the  damped  sine  wave  appearance  of  the  plot  encourages  us  to  look  fur¬ 
ther  at  the  sample  partial  autocorrelation.  Exhibit  6.26  displays  that  plot,  and  now  we 
see  clearly  that  an  AR(1)  model  is  worthy  of  first  consideration.  As  always,  our  speci¬ 
fied  models  are  tentative  and  subject  to  modification  during  the  model  diagnostics  stage 
of  model  building. 


>  pacf (color) 


Lag 
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The  Annual  Abundance  of  Canadian  Hare  Series 

The  time  series  of  annual  abundance  of  hare  of  the  Hudson  Bay  in  Canada  was  dis¬ 
played  in  Exhibit  1.5  on  page  5,  and  the  year-to-year  dependence  was  demonstrated  in 
Exhibit  1.6.  It  has  been  suggested  in  the  literature  that  a  transformation  might  be  used  to 
produce  a  good  model  for  these  data.  Exhibit  6.27  displays  the  log-likelihood  as  a  func¬ 
tion  of  the  power  parameter,  X.  The  maximum  occurs  at  X  =  0.4,  but  a  square  root  trans¬ 
formation  with  X  =  0.5  is  well  within  the  confidence  interval  for  X.  We  will  take  the 
square  root  of  the  abundance  values  for  all  further  analyses. 


Exhibit  6.27  Box-Cox  Power  Transformation  Results  for  Hare  Abundance 


>  win . graph (width=3 , height =3 , pointsize=8 ) 

>  data(hare);  BoxCox . ar (hare) 


Exhibit  6.28  shows  the  sample  ACF  for  this  transformed  series.  The  fairly  strong 
lag  1  autocorrelation  dominates  but,  again,  there  is  a  strong  indication  of  damped  oscil¬ 
latory  behavior. 
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The  sample  partial  autocorrelation  for  the  transformed  series  is  shown  in  Exhibit 
6.29.  It  gives  strong  evidence  to  support  an  AR(2)  or  possibly  an  AR(3)  model  for  these 
data. 


>  pacf (hareA . 5 ) 


Lag 
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The  Oil  Price  Series 

In  Chapter  5,  we  began  to  look  at  the  monthly  oil  price  time  series  and  argued  graphi¬ 
cally  that  the  difference  of  the  logarithms  could  be  considered  stationary — see  Exhibit 
5.1  on  page  88.  Software  implementation  of  the  Augmented  Dickey-Fuller  unit-root  test 
applied  to  the  logs  of  the  original  prices  leads  to  a  test  statistic  of  -1 . 1 1 19  and  a  /?- value 
of  0.9189.  With  stationarity  as  the  alternative  hypothesis,  this  provides  strong  evidence 
of  nonstationarity  and  the  appropriateness  of  taking  a  difference  of  the  logs.  For  this 
test,  the  software  chose  a  value  of  k  =  6  in  Equation  (6.4.1)  on  page  128  based  on 
large-sample  theory. 

Exhibit  6.30  shows  the  summary  EACF  table  for  the  differences  of  the  logarithms 
of  the  oil  price  data.  This  table  suggests  an  ARMA  model  with  p  =  0  and  q  =  1 . 


Exhibit  6.30 

Extended  ACF  for  Difference  of  Logarithms  of  Oil  Price 
Series 
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The  results  of  the  best  subsets  ARMA  approach  are  displayed  in  Exhibit  6.31. 


Exhibit  6.31  Best  Subset  ARMA  Model  for  Difference  of  Log(Oil) 


>  res=armasubsets (y=dif f ( log (oil . price) ) , nar=7 , nma=7 , 

y . name= 1  test ' ,  ar . method= ' ols ' ) 

>  plot (res) 

Here  the  suggestion  is  that  Yt  =  Vlog(Oilf)  should  be  modeled  in  terms  of  Yt _  |  and 
Yt_  4  and  that  no  lags  are  needed  in  the  error  terms.  The  second  best  model  omits  the  lag 
4  term  so  that  an  AR1MA(  1,1,0)  model  on  the  logarithms  should  also  be  investigated 
further. 
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Exhibit  6.32  suggests  that  we  specify  an  MA(1)  model  for  the  difference  of  the  log 
oil  prices,  and  Exhibit  6.33  says  to  consider  an  AR(2)  model  (ignoring  some  significant 
spikes  at  lags  15,  16,  and  20).  We  will  want  to  look  at  all  of  these  models  further  when 
we  estimate  parameters  and  perform  diagnostic  tests  in  Chapters  7  and  8.  (We  will  see 
later  that  to  obtain  a  suitable  model  for  the  oil  price  series,  the  outliers  in  the  series  will 
need  to  be  dealt  with.  (Can  you  spot  the  outliers  in  Exhibit  5.4  on  page  91?) 


2  4  6  8  10  12  14  16  18  20  22 

Lag 

>  acf (as .vector (dif f (log (oil .price) ) ) , xaxp=c (0,22,11)) 


Exhibit  6.33  Sample  PACF  of  Difference  of  Logged  Oil  Prices 
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>  pact (as .vector (dif f (log (oil .price) ) ) , xaxp=c (0,22,11) ) 
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6.7  Summary 


In  this  chapter,  we  considered  the  problem  of  specifying  reasonable  but  simple  models 
for  observed  times  series.  In  particular,  we  investigated  tools  for  choosing  the  orders  (p, 
d,  and  q)  for  AKIMA(;r(/,c/)  models.  Three  tools,  the  sample  autocorrelation  function, 
the  sample  partial  autocorrelation  function,  and  the  sample  extended  autocorrelation 
function,  were  introduced  and  studied  to  help  with  this  difficult  task.  The  Dickey-Fuller 
unit-root  test  was  also  introduced  to  help  distinguish  between  stationary  and  nonstation¬ 
ary  series.  These  ideas  were  all  illustrated  with  both  simulated  and  actual  time  series. 
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6.1  Verify  Equation  (6.1.3)  on  page  1 10  for  the  white  noise  process. 

6.2  Verify  Equation  (6.1.4)  on  page  1 10  for  the  AR(1)  process. 

6.3  Verify  the  line  in  Exhibit  6.1  on  page  111,  for  the  values  4)  =  ±0.9. 

6.4  Add  new  entries  to  Exhibit  6. 1  on  page  111,  for  the  following  values: 

(a)  4>  =  ±0.99. 

(b)  4>  =  ±0.5. 

(c)  4>  =  ±o.i. 

6.5  Verify  Equation  (6.1.9)  on  page  111  and  Equation  (6.1.10)  for  the  MA(1)  process. 

6.6  Verify  the  line  in  Exhibit  6.2  on  page  1 12,  for  the  values  9  =  ±0.9. 

6.7  Add  new  entries  to  Exhibit  6.2  on  page  1 12,  for  the  following  values: 

(a)  0  =  ±0.99. 

(b)  0  =  ±0.8. 

(c)  0  =  ±0.2. 

6.8  Verify  Equation  (6.1.1 1)  on  page  112,  for  the  general  MA(c/)  process. 

6.9  Use  Equation  (6.2.3)  on  page  1 13,  to  verify  the  value  for  the  lag  2  partial  autocor¬ 
relation  function  for  the  MA(1)  process  given  in  Equation  (6.2.5)  on  page  1 14. 

6.10  Show  that  the  general  expression  for  the  partial  autocorrelation  function  of  an 
MA(1)  process  given  in  Equation  (6.2.6)  on  page  114,  satisfies  the  Yule-Walker 
recursion  given  in  Equation  (6.2.7). 

6.11  Use  Equation  (6.2.8)  on  page  114,  to  find  the  (theoretical)  partial  autocorrelation 
function  for  an  AR(2)  model  in  terms  of  4>j  and  4b  and  lag  k  =  1,  2,  3,  ...  . 

6.12  From  a  time  series  of  100  observations,  we  calculate  rj  =  -0.49,  r2  =  0.31,  r2  = 
-0.21,  r4  =  0.11,  and  |rj  <  0.09  for  k  >  4.  On  this  basis  alone,  what  ARIMA 
model  would  we  tentatively  specify  for  the  series? 

6.13  A  stationary  time  series  of  length  121  produced  sample  partial  autocorrelation  of 
4>n  =  0.8,  4b2  =  -0.6,4*33  =  0-08,  and  4*44  =  0.00.  Based  on  this  information 
alone,  what  model  would  we  tentatively  specify  for  the  series? 

6.14  For  a  series  of  length  169,  we  find  that  /q  =  0.41,  r2  =  0.32,  r3  =  0.26,  r4  =  0.21, 
and  r5  =  0.16.  What  ARIMA  model  fits  this  pattern  of  autocorrelations? 
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6.15  The  sample  ACF  for  a  series  and  its  first  difference  are  given  in  the  following 
table.  Here  n  =  100. 

tag  1  2  3  4  5  6 

ACF  for  Y,  0.97  0.97  0.93  0.85  0.80  0.71 

ACF  for  VF,  -0.42  0.18  -0.02  0.07  -0.10  -0.09 

Based  on  this  information  alone,  which  AR1MA  model(s)  would  we  consider  for 
the  series? 

6.16  For  a  series  of  length  64,  the  sample  partial  autocorrelations  are  given  as: 

Lag  1  2  3  4  5 

PACF  0.47  -0.34  0.20  0.02  -0.06 

Which  models  should  we  consider  in  this  case? 

6.17  Consider  an  AR(1)  series  of  length  100  with  (])  =  0.7. 

(a)  Would  you  be  surprised  if  q  =  0.6? 

(b)  Would  =  -0.15  be  unusual? 

6.18  Suppose  the  {X,}  is  a  stationary  AR(1)  process  with  parameter  (J)  but  that  we  can 
only  observe  Yt  =  Xt  +  Nt  where  { N, }  is  the  white  noise  measurement  error  inde¬ 
pendent  of  {X,}. 

(a)  Find  the  autocorrelation  function  for  the  observed  process  in  terms  of  ()),  , 

and  <3^. 

(b)  Which  AR1MA  model  might  we  specify  for  {  F,  j  ? 

6.19  The  time  plots  of  two  series  are  shown  below. 

(a)  For  each  of  the  series,  describe  rj  using  the  terms  strongly  positive,  moder¬ 
ately  positive,  near  zero,  moderately  negative,  or  strongly  negative.  Do  you 
need  to  know  the  scale  of  measurement  for  the  series  to  answer  this? 

(b)  Repeat  part  (a)  for  r2. 
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6.20  Simulate  an  AR(1)  time  series  with  n  =  48  and  with  4>  =  0.7. 

(a)  Calculate  the  theoretical  autocorrelations  at  lag  1  and  lag  5  for  this  model. 

(b)  Calculate  the  sample  autocorrelations  at  lag  1  and  lag  5  and  compare  the  val¬ 
ues  with  their  theoretical  values.  Use  Equations  (6.1.5)  and  (6.1.6)  page  111, 
to  quantify  the  comparisons. 

(c)  Repeat  part  (b)  with  a  new  simulation.  Describe  how  the  precision  of  the  esti¬ 
mate  varies  with  different  samples  selected  under  identical  conditions. 

(d)  If  software  permits,  repeat  the  simulation  of  the  series  and  calculation  of  rq 
and  r5  many  times  and  form  the  sampling  distributions  of  rl  and  r5.  Describe 
how  the  precision  of  the  estimate  varies  with  different  samples  selected  under 
identical  conditions.  How  well  does  the  large-sample  variance  given  in  Equa¬ 
tion  (6.1.5)  on  page  111,  approximate  the  variance  in  your  sampling  distribu¬ 
tion? 

6.21  Simulate  an  MA(1)  time  series  with  n  =  60  and  with  0  =  0.5. 

(a)  Calculate  the  theoretical  autocorrelation  at  lag  1  for  this  model. 

(b)  Calculate  the  sample  autocorrelation  at  lag  1,  and  compare  the  value  with  its 
theoretical  value.  Use  Exhibit  6.2  on  page  1 12,  to  quantify  the  comparisons. 

(c)  Repeat  part  (b)  with  a  new  simulation.  Describe  how  the  precision  of  the  esti¬ 
mate  varies  with  different  samples  selected  under  identical  conditions. 

(d)  If  software  permits,  repeat  the  simulation  of  the  series  and  calculation  of  ?q 
many  times  and  form  the  sampling  distribution  of  ;q .  Describe  how  the  preci¬ 
sion  of  the  estimate  varies  with  different  samples  selected  under  identical  con¬ 
ditions.  How  well  does  the  large-sample  variance  given  in  Exhibit  6.2  on  page 
112,  approximate  the  variance  in  your  sampling  distribution? 

6.22  Simulate  an  AR(1)  time  series  with  n  =  48,  with 

(a)  4>  =  0.9,  and  calculate  the  theoretical  autocorrelations  at  lag  1  and  lag  5; 

(b)  4>  =  0.6,  and  calculate  the  theoretical  autocorrelations  at  lag  1  and  lag  5; 

(c)  4>  =  0.3,  and  calculate  the  theoretical  autocorrelations  at  lag  1  and  lag  5. 

(d)  For  each  of  the  series  in  parts  (a),  (b),  and  (c),  calculate  the  sample  autocorre¬ 

lations  at  lag  1  and  lag  5  and  compare  the  values  with  their  theoretical  values. 
Use  Equations  (6.1.5)  and  6.1.6,  page  111,  to  quantify  the  comparisons.  In 
general,  describe  how  the  precision  of  the  estimate  varies  with  the  value  of  4>. 

6.23  Simulate  an  AR(1)  time  series  with  4)  =  0.6,  with 

(a)  n  =  24,  and  estimate  pj  =  4>  =  0.6  with  r\ ; 

(b)  n  =  60,  and  estimate  pj  =  4>  =  0.6  with  r j; 

(c)  n  =  120,  and  estimate  pj  =  4>  =  0.6  with  rj. 

(d)  For  each  of  the  series  in  parts  (a),  (b),  and  (c),  compare  the  estimated  values 
with  the  theoretical  value.  Use  Equation  (6.1.5)  on  page  111,  to  quantify  the 
comparisons.  In  general,  describe  how  the  precision  of  the  estimate  varies 
with  the  sample  size. 
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6.24  Simulate  an  MA(1)  time  series  with  0  =  0.7,  with 

(a)  n  =  24,  and  estimate  pj  with  /q ; 

(b)  n  =  60,  and  estimate  pj  with  jq; 

(c)  n  =  120,  and  estimate  p]  with  rj. 

(d)  For  each  of  the  series  in  parts  (a),  (b),  and  (c),  compare  the  estimated  values  of 
Pj  with  the  theoretical  value.  Use  Exhibit  6.2  on  page  112,  to  quantify  the 
comparisons.  In  general,  describe  how  the  precision  of  the  estimate  varies 
with  the  sample  size. 

6.25  Simulate  an  AR(1)  time  series  of  length  n  =  36  with  ((>  =  0.7. 

(a)  Calculate  and  plot  the  theoretical  autocorrelation  function  for  this  model.  Plot 
sufficient  lags  until  the  correlations  are  negligible. 

(b)  Calculate  and  plot  the  sample  ACF  for  your  simulated  series.  How  well  do  the 
values  and  patterns  match  the  theoretical  ACF  from  part  (a)? 

(c)  What  are  the  theoretical  partial  autocorrelations  for  this  model? 

(d)  Calculate  and  plot  the  sample  ACF  for  your  simulated  series.  How  well  do  the 
values  and  patterns  match  the  theoretical  ACF  from  part  (a)?  Use  the 
large-sample  standard  errors  reported  in  Exhibit  6.1  on  page  111,  to  quantify 
your  answer. 

(e)  Calculate  and  plot  the  sample  PACF  for  your  simulated  series.  How  well  do 
the  values  and  patterns  match  the  theoretical  PACF  from  part  (c)?  Use  the 
large-sample  standard  errors  reported  on  page  1 15  to  quantify  your  answer. 

6.26  Simulate  an  MA(1)  time  series  of  length  ii  =  48  with  0  =  0.5. 

(a)  What  are  the  theoretical  autocorrelations  for  this  model? 

(b)  Calculate  and  plot  the  sample  ACF  for  your  simulated  series.  How  well  do  the 
values  and  patterns  match  the  theoretical  ACF  from  part  (a)? 

(c)  Calculate  and  plot  the  theoretical  partial  autocorrelation  function  for  this 
model.  Plot  sufficient  lags  until  the  correlations  are  negligible.  (Hint:  See 
Equation  (6.2.6)  on  page  1 14.) 

(d)  Calculate  and  plot  the  sample  PACF  for  your  simulated  series.  How  well  do 
the  values  and  patterns  match  the  theoretical  PACF  from  part  (c)? 

6.27  Simulate  an  AR(2)  time  series  of  length  n  =  72  with  cjq  =  0.7  and  §2  =  -0.4. 

(a)  Calculate  and  plot  the  theoretical  autocorrelation  function  for  this  model.  Plot 
sufficient  lags  until  the  correlations  are  negligible. 

(b)  Calculate  and  plot  the  sample  ACF  for  your  simulated  series.  How  well  do  the 
values  and  patterns  match  the  theoretical  ACF  from  part  (a)? 

(c)  What  are  the  theoretical  partial  autocorrelations  for  this  model? 

(d)  Calculate  and  plot  the  sample  ACF  for  your  simulated  series.  How  well  do  the 
values  and  patterns  match  the  theoretical  ACF  from  part  (a)? 

(e)  Calculate  and  plot  the  sample  PACF  for  your  simulated  series.  How  well  do 
the  values  and  patterns  match  the  theoretical  PACF  from  part  (c)? 
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6.28  Simulate  an  MA(2)  time  series  of  length  n  =  36  with  9j  =  0.7  and  02  =  -0.4. 

(a)  What  are  the  theoretical  autocorrelations  for  this  model? 

(b)  Calculate  and  plot  the  sample  ACF  for  your  simulated  series.  How  well  do  the 
values  and  patterns  match  the  theoretical  ACF  from  part  (a)? 

(c)  Plot  the  theoretical  partial  autocorrelation  function  for  this  model.  Plot  suffi¬ 
cient  lags  until  the  correlations  are  negligible.  (We  do  not  have  a  formula  for 
this  PACK  Instead,  perform  a  very  large  sample  simulation,  say  n  =  1000,  for 
this  model  and  calculate  and  plot  the  sample  PACF  for  this  simulation.) 

(d)  Calculate  and  plot  the  sample  PACF  for  your  simulated  series  of  part  (a).  How 
well  do  the  values  and  patterns  match  the  “theoretical”  PACF  from  part  (c)? 

6.29  Simulate  a  mixed  ARMA(l.l)  model  of  length  n  =  60  with  <|>  =  0.4  and  0  =  0.6. 

(a)  Calculate  and  plot  the  theoretical  autocorrelation  function  for  this  model.  Plot 
sufficient  lags  until  the  correlations  are  negligible. 

(b)  Calculate  and  plot  the  sample  ACF  for  your  simulated  series.  How  well  do  the 
values  and  patterns  match  the  theoretical  ACF  from  part  (a)? 

(c)  Calculate  and  interpret  the  sample  EACF  for  this  series.  Does  the  EACF  help 
you  specify  the  correct  orders  for  the  model? 

(d)  Repeat  parts  (b)  and  (c)  with  a  new  simulation  using  the  same  parameter  val¬ 
ues  and  sample  size. 

(e)  Repeat  parts  (b)  and  (c)  with  a  new  simulation  using  the  same  parameter  val¬ 
ues  but  sample  size  n  =  36. 

(f)  Repeat  parts  (b)  and  (c)  with  a  new  simulation  using  the  same  parameter  val¬ 
ues  but  sample  size  n  =  120. 

6.30  Simulate  a  mixed  ARMA(l.l)  model  of  length  n  =  100  with  4)  =  0.8  and  9  =  0.4. 

(a)  Calculate  and  plot  the  theoretical  autocorrelation  function  for  this  model.  Plot 
sufficient  lags  until  the  correlations  are  negligible. 

(b)  Calculate  and  plot  the  sample  ACF  for  your  simulated  series.  How  well  do  the 
values  and  patterns  match  the  theoretical  ACF  from  part  (a)? 

(c)  Calculate  and  interpret  the  sample  EACF  for  this  series.  Does  the  EACF  help 
you  specify  the  correct  orders  for  the  model? 

(d)  Repeat  parts  (b)  and  (c)  with  a  new  simulation  using  the  same  parameter  val¬ 
ues  and  sample  size. 

(e)  Repeat  parts  (b)  and  (c)  with  a  new  simulation  using  the  same  parameter  val¬ 
ues  but  sample  size  n  =  48. 

(f)  Repeat  parts  (b)  and  (c)  with  a  new  simulation  using  the  same  parameter  val¬ 
ues  but  sample  size  n  =  200. 
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6.31  Simulate  a  nonstationary  time  series  with  n  =  60  according  to  the  model 
ARIMA(0,1,1)  with  0  =  0.8. 

(a)  Perform  the  (augmented)  Dickey-Fuller  test  on  the  series  with  k  =  0  in  Equa¬ 
tion  (6.4.1)  on  page  128.  (With  k  =  0,  this  is  the  Dickey-Fuller  test  and  is  not 
augmented.)  Comment  on  the  results. 

(b)  Perform  the  augmented  Dickey-Fuller  test  on  the  series  with  k  chosen  by  the 
software — that  is,  the  “best”  value  for  k.  Comment  on  the  results. 

(c)  Repeat  parts  (a)  and  (b)  but  use  the  differences  of  the  simulated  series.  Com¬ 
ment  on  the  results.  (Here,  of  course,  you  should  reject  the  unit  root  hypothe¬ 
sis.) 

6.32  Simulate  a  stationary  time  series  of  length  n  =  36  according  to  an  AR(  1)  model 
with  (|)  =  0.95.  This  model  is  stationary,  but  just  barely  so.  With  such  a  series  and  a 
short  history,  it  will  be  difficult  if  not  impossible  to  distinguish  between  stationary 
and  nonstationary  with  a  unit  root. 

(a)  Plot  the  series  and  calculate  the  sample  ACF  and  PACF  and  describe  what  you 
see. 

(b)  Perform  the  (augmented)  Dickey-Fuller  test  on  the  series  with  k  =  0  in  Equa¬ 
tion  (6.4.1)  on  page  128.  (With  k  =  0  this  is  the  Dickey-Fuller  test  and  is  not 
augmented.)  Comment  on  the  results. 

(c)  Perform  the  augmented  Dickey-Fuller  test  on  the  series  with  k  chosen  by  the 
software — that  is,  the  “best”  value  for  k.  Comment  on  the  results. 

(d)  Repeat  parts  (a),  (b),  and  (c)  but  with  a  new  simulation  with  n  =  100. 

6.33  The  data  file  named  deerel  contains  82  consecutive  values  for  the  amount  of 
deviation  (in  0.000025  inch  units)  from  a  specified  target  value  that  an  industrial 
machining  process  at  Deere  &  Co.  produced  under  certain  specified  operating 
conditions. 

(a)  Display  the  time  series  plot  of  this  series  and  comment  on  any  unusual  points. 

(b)  Calculate  the  sample  ACF  for  this  series  and  comment  on  the  results. 

(c)  Now  replace  the  unusual  value  by  a  much  more  typical  value  and  recalculate 
the  sample  ACF.  Comment  on  the  change  from  what  you  saw  in  part  (b). 

(d)  Calculate  the  sample  PACF  based  on  the  revised  series  that  you  used  in  part 
(c).  What  model  would  you  specify  for  the  revised  series?  (Later  we  will 
investigate  other  ways  to  handle  outliers  in  time  series  modeling.) 

6.34  The  data  file  named  deere2  contains  102  consecutive  values  for  the  amount  of 
deviation  (in  0.0000025  inch  units)  from  a  specified  target  value  that  another 
industrial  machining  process  produced  at  Deere  &  Co. 

(a)  Display  the  time  series  plot  of  this  series  and  comment  on  its  appearance. 
Would  a  stationary  model  seem  to  be  appropriate? 

(b)  Display  the  sample  ACF  and  PACF  for  this  series  and  select  tentative  orders 
for  an  ARMA  model  for  the  series. 
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6.35  The  data  file  named  deere3  contains  57  consecutive  measurements  recorded  from 
a  complex  machine  tool  at  Deere  &  Co.  The  values  given  are  deviations  from  a 
target  value  in  units  of  ten  millionths  of  an  inch.  The  process  employs  a  control 
mechanism  that  resets  some  of  the  parameters  of  the  machine  tool  depending  on 
the  magnitude  of  deviation  from  target  of  the  last  item  produced. 

(a)  Display  the  time  series  plot  of  this  series  and  comment  on  its  appearance. 
Would  a  stationary  model  be  appropriate  here? 

(b)  Display  the  sample  ACF  and  PACF  for  this  series  and  select  tentative  orders 
for  an  ARMA  model  for  the  series. 

6.36  The  data  file  named  robot  contains  a  time  series  obtained  from  an  industrial  robot. 
The  robot  was  put  through  a  sequence  of  maneuvers,  and  the  distance  from  a 
desired  ending  point  was  recorded  in  inches.  This  was  repeated  324  times  to  form 
the  time  series. 

(a)  Display  the  time  series  plot  of  the  data.  Based  on  this  information,  do  these 
data  appear  to  come  from  a  stationary  or  nonstationary  process? 

(b)  Calculate  and  plot  the  sample  ACF  and  PACF  for  these  data.  Based  on  this 
additional  information,  do  these  data  appear  to  come  from  a  stationary  or  non¬ 
stationary  process? 

(c)  Calculate  and  interpret  the  sample  EACF. 

(d)  Use  the  best  subsets  ARMA  approach  to  specify  a  model  for  these  data.  Com¬ 
pare  these  results  with  what  you  discovered  in  parts  (a),  (b),  and  (c). 

6.37  Calculate  and  interpret  the  sample  EACF  for  the  logarithms  of  the  Los  Angeles 
rainfall  series.  The  data  are  in  the  file  named  larain.  Do  the  results  confirm  that  the 
logs  are  white  noise? 

6.38  Calculate  and  interpret  the  sample  EACF  for  the  color  property  time  series.  The 
data  are  in  the  color  file.  Does  the  sample  EACF  suggest  the  same  model  that  was 
specified  by  looking  at  the  sample  PACF? 

6.39  The  data  file  named  days  contains  accounting  data  from  the  Winegard  Co.  of  Bur¬ 
lington,  Iowa.  The  data  are  the  number  of  days  until  Winegard  receives  payment 
for  130  consecutive  orders  from  a  particular  distributor  of  Winegard  products. 
(The  name  of  the  distributor  must  remain  anonymous  for  confidentiality  reasons.) 

(a)  Plot  the  time  series,  and  comment  on  the  display.  Are  there  any  unusual  val¬ 
ues? 

(b)  Calculate  the  sample  ACF  and  PACF  for  this  series. 

(c)  Now  replace  each  of  the  unusual  values  with  a  value  of  35  days — much  more 
typical  values — and  repeat  the  calculation  of  the  sample  ACF  and  PACF. 
What  ARMA  model  would  you  specify  for  this  series  after  removing  the  out¬ 
liers?  (Later  we  will  investigate  other  ways  to  handle  outliers  in  time  series 
modeling.) 
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This  chapter  deals  with  the  problem  of  estimating  the  parameters  of  an  AR1MA  model 
based  on  the  observed  time  series  Kj,  T2,...,  Yn.  We  assume  that  a  model  has  already 
been  specified;  that  is,  we  have  specified  values  for  p,  d ,  and  q  using  the  methods  of 
Chapter  6.  With  regard  to  nonstationarity,  since  the  <fth  difference  of  the  observed  series 
is  assumed  to  be  a  stationary  ARMA(p//)  process,  we  need  only  concern  ourselves  with 
the  problem  of  estimating  the  parameters  in  such  stationary  models.  In  practice,  then  we 
treat  the  <fth  difference  of  the  original  time  series  as  the  time  series  from  which  we  esti¬ 
mate  the  parameters  of  the  complete  model.  For  simplicity,  we  shall  let  Y],  Y2,...,  Yn 
denote  our  observed  stationary  process  even  though  it  may  be  an  appropriate  difference 
of  the  original  series.  We  first  discuss  the  method-of-moments  estimators,  then  the  least 
squares  estimators,  and  finally  full  maximum  likelihood  estimators. 

7.1  The  Method  of  Moments 


The  method  of  moments  is  frequently  one  of  the  easiest,  if  not  the  most  efficient,  meth¬ 
ods  for  obtaining  parameter  estimates.  The  method  consists  of  equating  sample 
moments  to  corresponding  theoretical  moments  and  solving  the  resulting  equations  to 
obtain  estimates  of  any  unknown  parameters.  The  simplest  example  of  the  method  is  to 
estimate  a  stationary  process  mean  by  a  sample  mean.  The  properties  of  this  estimator 
were  studied  extensively  in  Chapter  3. 

Autoregressive  Models 

Consider  first  the  AR(1)  case.  For  this  process,  we  have  the  simple  relationship  pj  =  cj). 
In  the  method  of  moments,  p  |  is  equated  to  rlt  the  lag  1  sample  autocorrelation.  Thus 
we  can  estimate  4>  by 

$  =  r  j  (7.1.1) 

Now  consider  the  AR(2)  case.  The  relationships  between  the  parameters  <j)  i  and  4>2 
and  various  moments  are  given  by  the  Yule-Walker  equations  (4.3.13)  on  page  72: 

Pi  =  <t>l  +  Pl<t>2  and  P2  =  Pl4»l  +  4*2 

The  method  of  moments  replaces  pj  by  rl  and  p2  by  ro  to  obtain 
/'|  =  ((>  |  +  rj(J>T  and  r2  =  +  (j)-, 
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which  are  then  solved  to  obtain 


Q(1  ~rZ> 
1  -  rl 


and 


(7.1.2) 


The  general  AR (p)  case  proceeds  similarly.  Replace  p^.  by  rk  throughout  the 
Yule- Walker  equations  on  page  79  (or  page  1 14)  to  obtain 


4>  i  + 
^1  + 


r^2  + 

r2^3 

+ 

c1- 

+ 

§2  + 

0^3 

+ 

"  +  rp- 2( 

K 

'P-  HV 


rp-  2^2  + 


rp-3^3  + 


(7.1.3) 


These  linear  equations  are  then  solved  for  $1;  $2,  •  ••,  $/r  The  Durbin-Levinson  recur¬ 
sion  of  Equation  (6.2.9)  on  page  1 15  provides  a  convenient  method  of  solution  but  is 
subject  to  substantial  round-off  errors  if  the  solution  is  close  to  the  boundary  of  the  sta- 
tionarity  region.  The  estimates  obtained  in  this  way  are  also  called  Yule-Walker  esti¬ 
mates. 


Moving  Average  Models 

Surprisingly,  the  method  of  moments  is  not  nearly  as  convenient  when  applied  to  mov¬ 
ing  average  models.  Consider  the  simple  MA(1)  case.  From  Equations  (4.2.2)  on 
page  57,  we  know  that 

0 

Pl  ="l+02 


Equating  pj  to  r j,  we  are  led  to  solve  a  quadratic  equation  in  0.  If  |/-j|  <  0.5,  then  the  two 
real  roots  are  given  by 


2'T 


± 


As  can  be  easily  checked,  the  product  of  the  two  solutions  is  always  equal  to  1 ;  there¬ 
fore,  only  one  of  the  solutions  satisfies  the  invertibility  condition  |0|  <  1. 

After  further  algebraic  manipulation,  we  see  that  the  invertible  solution  can  be  writ¬ 
ten  as 


-1+ 

2  r 


(7.1.4) 


If  /-|  =  ±0.5,  unique,  real  solutions  exist,  namely  +1,  but  neither  is  invertible.  If  r  |  >  0.5 
(which  is  certainly  possible  even  though  |pj|  <  0.5),  no  real  solutions  exist,  and  so  the 
method  of  moments  fails  to  yield  an  estimator  of  0.  Of  course,  if  |/-j|  >  0.5,  the  specifica¬ 
tion  of  an  MA(1)  model  would  be  in  considerable  doubt. 
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For  higher-order  MA  models,  the  method  of  moments  quickly  gets  complicated. 
We  can  use  Equations  (4.2.5)  on  page  65  and  replace  by  ly  for  k  =  1,  2,...,  q,  to 
obtain  q  equations  in  q  unknowns  0 1 .  02,...,  0(/.  The  resulting  equations  are  highly  non¬ 
linear  in  the  0’s,  however,  and  their  solution  would  of  necessity  be  numerical.  In  addi¬ 
tion,  there  will  be  multiple  solutions,  of  which  only  one  is  invertible.  We  shall  not 
pursue  this  further  since  we  shall  see  in  Section  7.4  that,  for  MA  models,  the  method  of 
moments  generally  produces  poor  estimates. 


Mixed  Models 


We  consider  only  the  ARMA(1,1)  case.  Recall  Equation  (4.4.5)  on  page  78, 


P  k  = 


(1  -0d>)(d)-0) 

1  -  204>  +  02  v 


Noting  that  p2  /p  i  =  (|>,  we  can  first  estimate  4>  as 


for  k  >  1 


Having  done  so,  we  can  then  use 

,  =  (l-0$)($-0) 

1  1  -  20$  +  02 


(7.1.5) 

(7.1.6) 


A 

to  solve  for  0.  Note  again  that  a  quadratic  equation  must  be  solved  and  only  the  invert¬ 
ible  solution,  if  any,  retained. 


Estimates  of  the  Noise  Variance 

The  final  parameter  to  be  estimated  is  the  noise  variance,  a2  .  In  all  cases,  we  can  first 
estimate  the  process  variance,  y0  =  Var(Yt ),  by  the  sample  variance 


(7.1.7) 


r=l 


and  use  known  relationships  from  Chapter  4  among  yg,  ct2  ,  and  the  0’s  and  <|>’s  to  esti¬ 


mate  ct-  . 

For  the  AR (p)  models.  Equation  (4.3.31)  on  page  77  yields 

°e  =  (1-$i'T-$2'2 - Vp)'S'2 

In  particular,  for  an  AR(1)  process, 

(1  -l'2)i2 


% 


since  (p  =  rj. 


For  the  MA(g)  case,  we  have,  using  Equation  (4.2.4)  on  page  65, 

? 

A  o  S 

af  = 


1  +  §2  +  §2+...  +  §2 


(7.1.8) 


(7.1.9) 
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For  the  ARMA(1,1)  process.  Equation  (4.4.4)  on  page  78  yields 


°2  = 


1  -2$@  +  @2 


(7.1.10) 


Numerical  Examples 

The  table  in  Exhibit  7.1  displays  method-of-moments  estimates  for  the  parameters  from 
several  simulated  time  series.  Generally  speaking,  the  estimates  for  all  the  autoregres¬ 
sive  models  are  fairly  good  but  the  estimates  for  the  moving  average  models  are  not 
acceptable.  It  can  be  shown  that  theory  confirms  this  observation — method-of-moments 
estimators  are  very  inefficient  for  models  containing  moving  average  terms. 


Exhibit  7.1  Method-of-Moments  Parameter  Estimates  for  Simulated 
Series 


Method-of-Moments 

True  Parameters  Estimates 


Model 

0 

<th 

4>2 

0 

4>i 

^2 

n 

MA(1) 

-0.9 

-0.554 

120 

MA(1) 

0.9 

0.719 

120 

MA(1) 

-0.9 

NAf 

60 

MA(1) 

0.5 

-0.314 

60 

AR(1) 

0.9 

0.831 

60 

AR(1) 

0.4 

0.470 

60 

AR(2) 

1.5 

0.75 

1.472 

0.767 

120 

'  No  method-of-moments  estimate  exists  since  rl  =  0.544  for  this  simulation. 


>  data (mal . 2 . s) ;  data (mal . 1 . s) ;  data (mal . 3 . s) ;  data (mal . 4 . s) 

>  estimate .mal .mom (mal . 2 . s) ;  estimate . mal . mom (mal . 1 . s ) 

>  estimate .mal .mom (mal . 3 . s) ;  estimate . mal . mom (mal . 4 . s ) 

>  arima (mal . 4 . s , order=c (0,0,1) , method= ' CSS ' , include .mean=F) 

>  data(arl.s);  data (arl . 2 . s) 

>  ar (arl . s , order . max=l , AIC=F, method= 1 yw ' ) 

>  ar (arl . 2 . s , order . max=l , AIC=F, method= ' yw ' ) 

>  data (ar2 . s) 

>  ar (ar2 . s , order . max=2 , AIC=F, method= ' yw ' ) 


Consider  now  some  actual  time  series.  We  start  with  the  Canadian  hare  abundance 
series.  Since  we  found  in  Exhibit  6.27  on  page  136  that  a  square  root  transformation  was 
appropriate  here,  we  base  all  modeling  on  the  square  root  of  the  original  abundance 
numbers.  We  illustrate  the  estimation  of  an  AR(2)  model  with  the  hare  data,  even 
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though  we  shall  show  later  that  an  AR(3)  model  provides  a  better  fit  to  the  data.  The  first 
two  sample  autocorrelations  displayed  in  Exhibit  6.28  on  page  137  are  r,  =  0.736  and  r2 
=  0.304.  Using  Equations  (7.1.2),  the  method-of-moments  estimates  of  ()),  and  4»2  are 


and 


,T(1  ~r2)  =  0.736(1  -0.304) 
1  -r\  1  -  (0.736)2 


1.1178 


0.304  -(0.736)2  = 
1  -(0.736)2 


(7.1.11) 


(7.1.12) 


The  sample  mean  and  variance  of  this  series  (after  taking  the  square  root)  are  found  to 
be  5.82  and  5.88,  respectively.  Then,  using  Equation  (7.1.8),  we  estimate  the  noise  vari¬ 
ance  as 

Ge  =  (1  -$UT  -$2'2)s2 

=  [1  -(1.1178)(0.736)-(-0.519)(0.304)](5.88)  (7.1.13) 

=  1.97 

The  estimated  model  (in  original  terms)  is  then 

jYt- 5.82  =  1.1178(71^7, -5.82) -0.519(7^- 5.82) +  ef  (7.1.14) 
or 

JYt  =  2.335+  1.1178777; -0.519777;  +  ^  (7.1.15) 


with  estimated  noise  variance  of  1.97. 

Consider  now  the  oil  price  series.  Exhibit  6.32  on  page  140  suggested  that  we  spec¬ 
ify  an  MA(l)  model  for  the  first  differences  of  the  logarithms  of  the  series.  The  lag  1 
sample  autocorrelation  in  that  exhibit  is  0.212,  so  the  method-of-moments  estimate  of  9 
is 


§  =  -  1  +  7l  -4(0.212)2 

2(0.212) 


-0.222 


(7.1.16) 


The  mean  of  the  differences  of  the  logs  is  0.004  and  the  variance  is  0.0072.  The  esti¬ 
mated  model  is 


Vlog(Tr)  =  0.004  +  +  0.222ef  , 


(7.1.17) 


or 

log(Tf)  =  log (Tf  _,)  + 0.004  +  +  0.222e?_  , 

with  estimated  noise  variance  of 


6i 


0.0072 

1  +  (-0.222)2 


0.00686 


(7.1.18) 

(7.1.19) 
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Using  Equation  (3.2.3)  on  page  28  with  estimated  parameters  yields  a  standard  error  of 
the  sample  mean  of  0.0060.  Thus,  the  observed  sample  mean  of  0.004  is  not  signifi¬ 
cantly  different  from  zero  and  we  would  remove  the  constant  term  from  the  model,  giv¬ 
ing  a  final  model  of 

log  (Yt)  =  log(Yt_l)  +  et  +  0.222et^  (7.1.20) 

7.2  Least  Squares  Estimation 


Because  the  method  of  moments  is  unsatisfactory  for  many  models,  we  must  consider 
other  methods  of  estimation.  We  begin  with  least  squares.  For  autoregressive  models, 
the  ideas  are  quite  straightforward.  At  this  point,  we  introduce  a  possibly  nonzero  mean, 
p,  into  our  stationary  models  and  treat  it  as  another  parameter  to  be  estimated  by  least 
squares. 

Autoregressive  Models 

Consider  the  first-order  case  where 

Yt-\x  =  +  (7.2.1) 

We  can  view  this  as  a  regression  model  with  predictor  variable  Y,  _  |  and  response  vari¬ 
able  Yt.  Least  squares  estimation  then  proceeds  by  minimizing  the  sum  of  squares  of  the 
differences 

(Tf-|i)-(KL,_i-|i) 

Since  only  Kj,  Yj,...,  Yn  are  observed,  we  can  only  sum  from  t  =  2  to  t  =  n.  Let 

SC«>,B)=  £  (7.2.2) 

t  =  2 

This  is  usually  called  the  conditional  sum-of-squares  function.  (The  reason  for  the 
term  conditional  will  become  apparent  later  on.)  According  to  the  principle  of  least 
squares,  we  estimate  4)  and  p  by  the  respective  values  that  minimize  Sc(<|),p)  given  the 
observed  values  of  Tj,  1A-  ■  Yn. 

Consider  the  equation  dSc/  dp  =  0 .  We  have 

f-c  =  f  2[(yf-n)-«Krf_1-n)](-i+«|>)  =  o 

op  t  =  2 

or,  simplifying  and  solving  for  p, 
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Now,  for  large  n. 


1 

n  -  1 


n 


1 

n—  1 


t  =  2 


Y 


Thus,  regardless  of  the  value  of  ()),  Equation  (7.2.3)  reduces  to 

(U-Z(T-4)P)  =  T  (7.2.4) 

1-9 

We  sometimes  say,  except  for  end  effects,  p  =  Y . 

Consider  now  the  minimization  of  S  (  <|).  Y)  with  respect  to  <|>.  We  have 

dSM,  Y)  n 

~^—=  ^2[(Yt-Y)-HYt_1-Y)](Yt_1-Y) 

t  =  2 

Setting  this  equal  to  zero  and  solving  for  §  yields 


£(r,-r)(r,_i  -Y) 

A  t  =  2 

t  =  2 


Except  for  one  term  missing  in  the  denominator,  namely  (Y  -  T)2 ,  this  is  the  same  as 
rj.  The  lone  missing  term  is  negligible  for  stationary  processes,  and  thus  the  least 
squares  and  method-of-moments  estimators  are  nearly  identical,  especially  for  large 
samples. 

For  the  general  AR (p)  process,  the  methods  used  to  obtain  Equations  (7.2.3)  and 
(7.2.4)  can  easily  be  extended  to  yield  the  same  result,  namely 


p  =  Y  (7.2.5) 

To  generalize  the  estimation  of  the  <|>’s,  we  consider  the  second-order  model.  In  accor¬ 
dance  with  Equation  (7.2.5),  we  replace  p  by  Y  in  the  conditional  sum-of-squares  func¬ 
tion,  so 


SC(^V^_,Y)=  j^[(Yt-Y)-^x{Yt_x-Y)-^2(Yt_2-Y)^  (7.2.6) 

t  =  3 

Setting  =  0  ,  we  have 

-2i  V{Yt-Y)-^x{Yt^l-Y)-^Yt_2-Y)]{Yt_l-Y)  =  0  (7.2.7) 

t  =  3 


which  we  can  rewrite  as 
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t  =  3 


X^r-i 

t  =  3 


ry 


(7.2.8) 


n 


n 

Z<y»- 

t  =  3 


-Wf_2-10 


4*2 


y 


The  sum  of  the  lagged  products  £  ( K  -  L)(  L.  |  -  K)  is  very  nearly  the  numerator  of 

1  =  3 

r1 — we  are  missing  one  product,  (Y2  —  T)(  Yf  -  Y).  A  similar  situation  exists  for 

£  (Yf_  j  -  Y)(Yf_0  -  T) ,  but  here  we  are  missing  ( Y n  -  Y)(Yn_  l  -  Y) .  If  we  divide 
t  =  3  n  _ 

both  sides  of  Equation  (7.2.8)  by  X  _  ^0  .  then,  except  for  end  effects,  which  are 
negligible  under  the  stationarity  assumption,  we  obtain 


''I  =<)>!+  rj(|)2  (7.2.9) 

Approximating  in  a  similar  way  with  the  equation  dS  /d§ 2  =  0  leads  to 

r2  =  r  j  4>  j  +  <()2  (7.2.10) 

But  Equations  (7.2.9)  and  (7.2.10)  are  just  the  sample  Yule-Walker  equations  for  an 
AR(2)  model. 

Entirely  analogous  results  follow  for  the  general  stationary  AR(/?)  case:  To  an 
excellent  approximation,  the  conditional  least  squares  estimates  of  the  <t>’s  are  obtained 
by  solving  the  sample  Yule-Walker  equations  (7.1.3). 


Moving  Average  Models 

Consider  now  the  least-squares  estimation  of  0  in  the  MA(1)  model: 

Yt  =  et-Qet-i  (7.2. n) 

At  first  glance,  it  is  not  apparent  how  a  least  squares  or  regression  method  can  be 
applied  to  such  models.  However,  recall  from  Equation  (4.4.2)  on  page  77  that  invert¬ 
ible  MA(1)  models  can  be  expressed  as 

Y,  =  -QYt_l-^Yt_2-^Yt_3--+et 

an  autoregressive  model  but  of  infinite  order.  Thus  least  squares  can  be  meaningfully 
carried  out  by  choosing  a  value  of  0  that  minimizes 


1  We  note  that  Lai  and  Wei  (1983)  established  that  the  conditional  least  squares  estimators 
are  consistent  even  for  nonstationary  autoregressive  models  where  the  Yule-Walker  equa¬ 
tions  do  not  apply. 
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Sc(0)  =  Z(e,)2  =  'L[Y,  +  QYl_l+Q2Yt_2  +  Q3Yt_3+---  ]2  (7.2.12) 


where,  implicitly,  e,  =  et(Q)  is  a  function  of  the  observed  series  and  the  unknown  param¬ 
eter  9. 

It  is  clear  from  Equation  (7.2.12)  that  the  least  squares  problem  is  nonlinear  in  the 
parameters.  We  will  not  be  able  to  minimize  Sc( 9)  by  taking  a  derivative  with  respect  to 
9,  setting  it  to  zero,  and  solving.  Thus,  even  for  the  simple  MA(1)  model,  we  must  resort 
to  techniques  of  numerical  optimization.  Other  problems  exist  in  this  case:  We  have  not 
shown  explicit  limits  on  the  summation  in  Equation  (7.2.12)  nor  have  we  said  how  to 
deal  with  the  infinite  series  under  the  summation  sign. 

To  address  these  issues,  consider  evaluating  Sc( 9)  for  a  single  given  value  of  9.  The 
only  F’s  we  have  available  are  our  observed  series,  Tj,  Y 2,...,  Yn.  Rewrite  Equation 
(7.2.11)  as 

«,  =  Yt  +  Qet-l  (7.2.13) 


Using  this  equation,  e\,  e-i,...,  en  can  be  calculated  recursively  if  we  have  the  initial 
value  eg.  A  common  approximation  is  to  set  eg  =  0 — its  expected  value.  Then,  condi¬ 
tional  on  eg  =  0,  we  can  obtain 


e2  =  Ft  +  9e( 
e3  =  F3  +  0e2  - 


e„  =  Y„  +  9e 


n  -  1  J 


(7.2.14) 


and  thus  calculate  Sc( 9 )  =  Z(er)2>  conditional  on  eg  =  0,  for  that  single  given  value  of 
9. 

For  the  simple  case  of  one  parameter,  we  could  carry  out  a  grid  search  over  the 
invertible  range  (-1,  +  1)  for  9  to  find  the  minimum  sum  of  squares.  For  more  general 
MA(g)  models,  a  numerical  optimization  algorithm,  such  as  Gauss-Newton  or  Nelder- 
Mead,  will  be  needed. 

For  higher-order  moving  average  models,  the  ideas  are  analogous  and  no  new  diffi¬ 
culties  arise.  We  compute  et  =  ef(9j,  92,...,  9(/)  recursively  from 


Y,  +  9 ,  e 

t  1  t- 


1  +02  V 


-  +  V/-« 


(7.2.15) 


with  eg  =  e_i  =  •••  =  e_q  =  0.  The  sum  of  squares  is  minimized  jointly  in  9j,  92,..-,  9^ 
using  a  multivariate  numerical  method. 


Mixed  Models 

Consider  the  ARM A(  1,1)  case 


Y,  =  §Y(_  j  +  ef-Qet_  l 


(7.2.16) 
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As  in  the  pure  MA  case,  we  consider  et  =  ef((j),9)  and  wish  to  minimize  Sc(<|>,  9)  =  Ye7 
We  can  rewrite  Equation  (7.2.16)  as 

et=  Yt-^Yt_l  +  Qet_l  (7.2.17) 

To  obtain  e\,  we  now  have  an  additional  “startup”  problem,  namely  Y{).  One  approach  is 
to  set  Fq  =  9  or  to  Y  if  our  model  contains  a  nonzero  mean.  However,  a  better  approach 
is  to  begin  the  recursion  at  t  =  2,  thus  avoiding  Y{)  altogether,  and  simply  minimize 

S  (^9)=  Y.  ef 

t  =  2 

For  the  general  ARMA(/?,g)  model,  we  compute 

«,=  Y,-*lYt-i-*2Yt-2-—*pYt-p 

(7.2.18) 

+  0lef_l  +  Vf_2+  +  Qqet-q 

with  ep  =  ep_  i  =  ■  ■  ■  =  ep  +  j  _ q  =  0  and  then  minimize  Sc(§ j , (|>2, . . . , §p, 9i, 92, . . . , 9?) 
numerically  to  obtain  the  conditional  least  squares  estimates  of  all  the  parameters. 

For  parameter  sets  9j,  92,...,  9?  corresponding  to  invertible  models,  the  start-up  val¬ 
ues  ep,  ep  .  j,...,  ep  +  q  will  have  very  little  influence  on  the  final  estimates  of  the 
parameters  for  large  samples. 

7.3  Maximum  Likelihood  and  Unconditional  Least  Squares 


For  series  of  moderate  length  and  also  for  stochastic  seasonal  models  to  be  discussed  in 
Chapter  10,  the  start-up  values  ep  =  ep  _  \  =  ■■■  =  ep+l_q  =  0  will  have  a  more  pro¬ 
nounced  effect  on  the  final  estimates  for  the  parameters.  Thus  we  are  led  to  consider  the 
more  difficult  problem  of  maximum  likelihood  estimation. 

The  advantage  of  the  method  of  maximum  likelihood  is  that  all  of  the  information 
in  the  data  is  used  rather  than  just  the  first  and  second  moments,  as  is  the  case  with  least 
squares.  Another  advantage  is  that  many  large-sample  results  are  known  under  very 
general  conditions.  One  disadvantage  is  that  we  must  for  the  first  time  work  specifically 
with  the  joint  probability  density  function  of  the  process. 

Maximum  Likelihood  Estimation 

For  any  set  of  observations,  lj,  Y2,...,  Yn,  time  series  or  not,  the  likelihood  function  L  is 
defined  to  be  the  joint  probability  density  of  obtaining  the  data  actually  observed.  How¬ 
ever,  it  is  considered  as  a  function  of  the  unknown  parameters  in  the  model  with  the 
observed  data  held  fixed.  For  ARIMA  models,  L  will  be  a  function  of  the  4>’s,  9’s,  p,  and 
ct^  given  the  observations  fj,  Y2,...,  Yn.  The  maximum  likelihood  estimators  are  then 
defined  as  those  values  of  the  parameters  for  which  the  data  actually  observed  are  most 
likely,  that  is,  the  values  that  maximize  the  likelihood  function. 

We  begin  by  looking  in  detail  at  the  AR(1)  model.  The  most  common  assumption  is 
that  the  white  noise  terms  are  independent,  normally  distributed  random  variables  with 
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zero  means  and  common  standard  deviation  a  .  The  probability  density  function  (pdf) 
of  each  e,  is  then 


(2rcG2)  1/2exp 


2ct2 


for  -oo  <  e,  <  oo 


and,  by  independence,  the  joint  pdf  for  e2,  £3,- . en  is 


(2jict2)  ^/2exp 


1 

2cr2 


t  =  2 


Now  consider 


T2-p  =  cKTj-p  )  +  e2 
F3-p  =  ty(Y2-\x)  +  e3 

Yn~V  =  ^Yn- 1"H)  + V 


(7.3.1) 


(7.3.2) 


If  we  condition  on  Yj  =  y  | ,  Equation  (7.3.2)  defines  a  linear  transformation  between  e2, 
e3,...,  and  Y2,  Y3,...,  Yn  (with  Jacobian  equal  to  1).  Thus  the  joint  pdf  of  Y2,  T3,...,  Yn 
given  K|  =  V]  can  be  obtained  by  using  Equation  (7.3.2)  to  substitute  for  the  e’s  in  terms 
of  the  Y’s  in  Equation  (7.3.1).  Thus  we  get 

-,yn\y\)  =  ^lr{n-X)/2 


x  exp< 


n 


t  =  2 


(7.3.3) 


Now  consider  the  (marginal)  distribution  of  Ij.  It  follows  from  the  linear  process  repre¬ 
sentation  of  the  AR(1)  process  (Equation  (4.3.8)  on  page  70)  that  Yl  will  have  a  normal 
distribution  with  mean  p  and  variance  o|/(  1  -  4>2) .  Multiplying  the  conditional  pdf  in 
Equation  (7.3.3)  by  the  marginal  pdf  of  Yj  gives  us  the  joint  pdf  of  Yj,  Y2,...,  Yn  that  we 
require.  Interpreted  as  a  function  of  the  parameters  (|>,  p,  and  a2  ,  the  likelihood  function 
for  an  AR(1 )  model  is  given  by 


where 


z,(4>,  pi,  cj2)  =  (2jtcr2)  ,!/2(l -(|)2)1/2exp 


L  2°e 


5(f  p) 


(7.3.4) 


S(4>,n)  =  £  [(^  -  H)  -  cK^_  1  -  ^)]2  +  ( 1  -  ct>2)(J-i  -  la)  (7.3.5) 

t  =  2 

The  function  ,SY(j),  p)  is  called  the  unconditional  sum-of-squares  function. 

As  a  general  rule,  the  logarithm  of  the  likelihood  function  is  more  convenient  to 
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work  with  than  the  likelihood  itself.  For  the  AR(  1 )  case,  the  log-likelihood  function, 
denoted  e(§,  | a,  ct2) ,  is  given  by 

*(<(>,  =  -  ^log(27t)  -  ^log(ctg)  +  |log(l  -  (j)2)  -  F)  (7-3-6> 

For  given  values  of  (|)  and  p,  C(§,  p,  a2)  can  be  maximized  analytically  with  respect 
to  CT“  in  terms  of  the  yet-to-be-determined  estimators  of  (J)  and  p.  We  obtain 

ct2  =  SW’  (7.3.7) 

e  n 

As  in  many  other  similar  contexts,  we  usually  divide  by  n  -  2  rather  than  n  (since  we  are 
estimating  two  parameters,  4>  and  p)  to  obtain  an  estimator  with  less  bias.  For  typical 
time  series  sample  sizes,  there  will  be  very  little  difference. 

Consider  now  the  estimation  of  4>  and  p.  A  comparison  of  the  unconditional 
sum-of-squares  function  ,S((j),p)  with  the  earlier  conditional  sum-of-squares  function 
Sc(<|),p)  of  Equation  (7.2.2)  on  page  154,  reveals  one  simple  difference: 

S(4>,p)  =  5f(^p)  +  (l-^)(F1-p)2  (7.3.8) 

Since  5c((|),p)  involves  a  sum  of  n  -  1  components,  whereas  ( 1  -  (|)2)(F1  -  p)2  does  not 
involve  n,  we  shall  have  S{§,  p)  »  S  (§,  p).  Thus  the  values  of  4)  and  p  that  minimize 
5(<|),p)  or  .S'(.(<j),  p)  should  be  very  similar,  at  least  for  larger  sample  sizes.  The  effect  of 
the  rightmost  term  in  Equation  (7.3.8)  will  be  more  substantial  when  the  minimum  for  4> 
occurs  near  the  stationarity  boundary  of  ±1. 

Unconditional  Least  Squares 

As  a  compromise  between  conditional  least  squares  estimates  and  full  maximum  likeli¬ 
hood  estimates,  we  might  consider  obtaining  unconditional  least  squares  estimates;  that 
is,  estimates  minimizing  5(<|>,p).  Unfortunately,  the  term  (1  -t|)2)(yi  -  p)2  causes  the 
equations  8S/d§  =  0  and  dS/d p  =  0  to  be  nonlinear  in  <|>  and  p,  and  reparameteriza¬ 
tion  to  a  constant  term  0O  =  p(l  -  4>)  does  not  improve  the  situation  substantially.  Thus 
minimization  must  be  carried  out  numerically.  The  resulting  estimates  are  called  uncon¬ 
ditional  least  squares  estimates. 

The  derivation  of  the  likelihood  function  for  more  general  ARMA  models  is  con¬ 
siderably  more  involved.  One  derivation  may  be  found  in  Appendix  H:  State  Space 
Models  on  page  222.  We  refer  the  reader  to  Brockwell  and  Davis  (1991)  or  Shumway 
and  Stoffer  (2006)  for  even  more  details. 

7.4  Properties  of  the  Estimates 


The  large-sample  properties  of  the  maximum  likelihood  and  least  squares  (conditional 
or  unconditional)  estimators  are  identical  and  can  be  obtained  by  modifying  standard 
maximum  likelihood  theory.  Details  can  be  found  in  Shumway  and  Stoffer  (2006,  pp. 
125-129).  We  shall  look  at  the  results  and  their  implications  for  simple  ARMA  models. 
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For  large  n,  the  estimators  are  approximately  unbiased  and  normally  distributed. 
The  variances  and  correlations  are  as  follows: 


AR(1):  V«r($) 


1  -  (|)2 


(7.4.9) 


AR(2): 


yflr(4>j)  «  y<7r((|)9) 

Corr($j,  =  -Pj 

1  —  (t)2 


MA(1):  Var(§)«  — — 


(7.4.10) 


(7.4.11) 


MA(2):  { 


Var(Q^)  «  Far(09) 


!  -9| 


Corr(Qp  02)  ~ 


6| 

!  -02 


(7.4.12) 


Var(§) 


ARMA(1,1): 


Var(Q) 


i  -  4>2 

r1  -4)0i 

n 

CX> 

l 

ri  -02i 

ri -4>e-| 

n 

|_4>-0  J 

Corr($,  0) 


7(  1  -  (j)2)(  1  -  02) 
1-<|)0 


(7.4.13) 


Notice  that,  in  the  AR(1)  case,  the  variance  of  the  estimator  of  4)  decreases  as  4) 
approaches  ±1.  Also  notice  that  even  though  an  AR(1)  model  is  a  special  case  of  an 
AR(2)  model,  the  variance  of  c|)  |  shown  in  Equations  (7.4.10)  shows  that  our  estimation 
of  4>i  will  generally  suffer  if  we  erroneously  fit  an  AR(2)  model  when,  in  fact,  4b  =  0. 
Similar  comments  could  be  made  about  fitting  an  MA(2)  model  when  an  MA(1)  would 
suffice  or  fitting  an  ARM A(  1,1)  when  an  AR(1)  or  an  MA(1)  is  adequate. 

For  the  ARMA(l.l)  case,  note  the  denominator  of  4>  -  0  in  the  variances  in  Equa¬ 
tions  (7.4. 13).  If  4>  and  0  are  nearly  equal,  the  variability  in  the  estimators  of  4>  and  0  can 
be  extremely  large. 

Note  that  in  all  of  the  two-parameter  models,  the  estimates  can  be  highly  correlated, 
even  for  very  large  sample  sizes. 

The  table  shown  in  Exhibit  7.2  gives  numerical  values  for  the  large-sample  approx¬ 
imate  standard  deviations  of  the  estimates  of  4>  in  an  AR(1)  model  for  several  values  of 
4>  and  several  sample  sizes.  Since  the  values  in  the  table  are  equal  to  J(l  -  4>2)/n  ,  they 
apply  equally  well  to  standard  deviations  computed  according  to  Equations  (7.4.10), 


162 


Parameter  Estimation 


(7.4.11),  and  (7.4.12). 

Thus,  in  estimating  an  AR(1)  model  with,  for  example,  n  =  100  and  (j)  =  0.7,  we  can 
be  about  95%  confident  that  our  estimate  of  4)  is  in  error  by  no  more  than  ±2(0.07)  = 
±0.14. 

Exhibit  7.2  AR(1)  Model  Large-Sample  Standard  Deviations  of  $  " 

n 


(|)  50  100  200 


0.4 

0.13 

0.09 

0.06 

0.7 

0.10 

0.07 

0.05 

0.9 

0.06 

0.04 

0.03 

For  stationary  autoregressive  models,  the  method  of  moments  yields  estimators 
equivalent  to  least  squares  and  maximum  likelihood,  at  least  for  large  samples.  For  mod¬ 
els  containing  moving  average  terms,  such  is  not  the  case.  For  an  MA(1)  model,  it  can 
be  shown  that  the  large-sample  variance  of  the  method-of-moments  estimator  of  9  is 
equal  to 


Fflr(0) 


1  +  92  +  464  +  96  +  98 
«(1  -92)2 


(7.4.14) 


Comparing  Equation  (7.4.14)  with  that  of  Equation  (7.4.11),  we  see  that  the  variance  for 
the  method-of-moments  estimator  is  always  larger  than  the  variance  of  the  maximum 
likelihood  estimator.  The  table  in  Exhibit  7.3  displays  the  ratio  of  the  large-sample  stan¬ 
dard  deviations  for  the  two  methods  for  several  values  of  9.  For  example,  if  9  is  0.5,  the 
method-of-moments  estimator  has  a  large-sample  standard  deviation  that  is  42%  larger 
than  the  standard  deviation  of  the  estimator  obtained  using  maximum  likelihood.  It  is 
clear  from  these  ratios  that  the  method-of-moments  estimator  should  not  be  used  for  the 
MA(1)  model.  This  same  advice  applies  to  all  models  that  contain  moving  average 
terms. 


Exhibit  7.3  Method  of  Moments  (MM)  vs.  Maximum  Likelihood  (MLE)  in 
MA(1)  Models 


9 

SDmJSDmle 

0.25 

1.07 

0.50 

1.42 

0.75 

2.66 

0.90 

5.33 
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Consider  the  simulated  MA(1)  series  with  9  =  -0.9.  The  series  was  displayed  in  Exhibit 
4.2  on  page  59,  and  we  found  the  method-of-moments  estimate  of  9  to  be  a  rather  poor 
-0.554;  see  Exhibit  7.1  on  page  152.  In  contrast,  the  maximum  likelihood  estimate  is 
-0.915,  the  unconditional  sum-of-squares  estimate  is  -0.923,  and  the  conditional  least 
squares  estimate  is  -0.879.  For  this  series,  the  maximum  likelihood  estimate  of -0.915 
is  closest  to  the  true  value  used  in  the  simulation.  Using  Equation  (7.4.1 1)  on  page  161 
and  replacing  9  by  its  estimate,  we  have  a  standard  error  of  about 

so  none  of  the  maximum  likelihood,  conditional  sum-of-squares,  or  unconditional 
sum-of-squares  estimates  are  significantly  far  from  the  true  value  of  -0.9. 

The  second  MA(1)  simulation  with  0  =  0.9  produced  the  method-of-moments  esti¬ 
mate  of  0.719  shown  in  Exhibit  7.1.  The  conditional  sum-of-squares  estimate  is  0.958, 
the  unconditional  sum-of-squares  estimate  is  0.983,  and  the  maximum  likelihood  esti¬ 
mate  is  1.000.  These  all  have  a  standard  error  of  about  0.04  as  above.  Here  the  maxi- 

A 

mum  likelihood  estimate  of  9  =  1  is  a  little  disconcerting  since  it  corresponds  to  a 
noninvertible  model. 

The  third  MA(1)  simulation  with  9  =  -0.9  produced  a  method-of-moments  estimate 
of -0.719  (see  Exhibit  7.1).  The  maximum  likelihood  estimate  here  is  -0.894  with  a 
standard  error  of  about 


For  these  data,  the  conditional  sum-of-squares  estimate  is  -0.979  and  the  unconditional 
sum-of-squares  estimate  is  -0.961.  Of  course,  with  a  standard  error  of  this  magnitude,  it 
is  unwise  to  report  digits  in  the  estimates  of  9  beyond  the  tenths  place. 

For  our  simulated  autoregressive  models,  the  results  are  reported  in  Exhibits  7.4 
and  7.5. 


Exhibit  7.4 


Parameter  (J) 

0.9 

0.4 


Parameter  Estimation  for  Simulated  AR(1)  Models 


Method-of- 

Moments 

Estimate 

0.831 

0.470 


Conditional 

SS 

Estimate 

0.857 

0.473 


Unconditional 

SS 

Estimate 

0.911 

0.473 


Maximum 

Likelihood 

Estimate 

0.892 

0.465 


n 

60 

60 


>  data(arl.s);  data (arl . 2 . s) 

>  ar (arl . s , order . max=l , AIC=F , method= 1 yw ' ) 

>  ar (arl . s , order . max=l , AIC=F , method= ' ols 1 ) 

>  ar (arl . s , order . max=l , AIC=F , method= 1 mle ' ) 
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>  ar (arl . 2 . s , order . max=l , AIC=F, method= ' yw 1 ) 

>  ar (arl . 2 . s , order . max=l , AIC=F, method= ' ols ' ) 

>  ar (arl . 2 . s , order . max=l , AIC=F, method= ' mle ' ) 


From  Equation  (7.4.9)  on  page  161,  the  standard  errors  for  the  estimates  are 

Jvaf( f)  « 

and 

respectively.  Considering  the  magnitude  of  these  standard  errors,  all  four  methods  esti¬ 
mate  reasonably  well  for  AR(1)  models. 


Exhibit  7.5  Parameter  Estimation  for  a  Simulated  AR(2)  Model 


Method-of- 

Conditional 

Unconditional 

Maximum 

Moments 

SS 

SS 

Likelihood 

Parameters 

Estimates 

Estimates 

Estimates 

Estimate 

n 

»n 

II 

S- 

1.472 

1.5137 

1.5183 

1.5061 

120 

(|)2  =  -0.75 

-0.767 

-0.8050 

-0.8093 

-0.7965 

120 

>  data (ar2 . s) 

>  ar (ar2 . s , order . max=2 , AIC=F, method= ' yw ' ) 

>  ar (ar2 . s , order . max=2 , AIC=F, method= ' ols '  ) 

>  ar (ar2 . s , order . max=2 , AIC=F, method= ' mle ' ) 


From  Equation  (7.4.10)  on  page  161,  the  standard  errors  for  the  estimates  are 


JVari^)*  Jvar(§2) 


jl  -(0.75)2 

V  120 


0.06 


Again,  considering  the  size  of  the  standard  errors,  all  four  methods  estimate  reasonably 
well  for  AR(2)  models. 

As  a  final  example  using  simulated  data,  consider  the  ARM A(  1,1)  shown  in  Exhibit 
6.14  on  page  123.  Here  4)  =  0.6,  9  =  -0.3,  and  n  =  100.  Estimates  using  the  various 
methods  are  shown  in  Exhibit  7.6. 
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Exhibit  7.6  Parameter  Estimation  for  a  Simulated  ARM A(1 ,1 )  Model 


Method-of- 

Conditional 

Unconditional 

Maximum 

Moments 

SS 

SS 

Likelihood 

Parameters 

Estimates 

Estimates 

Estimates 

Estimate 

n 

o 

II 

-©- 

0.637 

0.5586 

0.5691 

0.5647 

100 

0  =  -0.3 

-0.2066 

-0.3669 

-0.3618 

-0.3557 

100 

>  data (armall . s) 

>  arima (armall . s ,  order=c ( 1 , 0 , 1 ) , method= ' CSS ' ) 

>  arima (armall . s ,  order=c ( 1 , 0 , 1 ) , method= ' ML ' ) 


Now  let’s  look  at  some  real  time  series.  The  industrial  chemical  property  time  series 
was  first  shown  in  Exhibit  1.3  on  page  3.  The  sample  PACF  displayed  in  Exhibit  6.26 
on  page  135,  strongly  suggested  an  AR(1)  model  for  this  series.  Exhibit  7.7  shows  the 
various  estimates  of  the  4)  parameter  using  four  different  methods  of  estimation. 


Exhibit  7.7  Parameter  Estimation  for  the  Color  Property  Series 


Method-of- 

Conditional 

Unconditional 

Maximum 

Moments 

SS 

SS 

Likelihood 

Parameter 

Estimate 

Estimate 

Estimate 

Estimate 

n 

4 

0.5282 

0.5549 

0.5890 

0.5703 

35 

>  data (color) 

>  ar (color , order . max=l , AIC=F , method= 1 yw ' ) 

>  ar (color , order . max=l , AIC=F , method= 1 ols 1 ) 

>  ar (color , order . max=l , AIC=F , method= 1 mle ' ) 


Here  the  standard  error  of  the  estimates  is  about 

JvaHfo  *  *  0.14 

so  all  of  the  estimates  are  comparable. 

As  a  second  example,  consider  again  the  Canadian  hare  abundance  series.  As 
before,  we  base  all  modeling  on  the  square  root  of  the  original  abundance  numbers. 
Based  on  the  partial  autocorrelation  function  shown  in  Exhibit  6.29  on  page  137,  we 
will  estimate  an  AR(3)  model.  For  this  illustration,  we  use  maximum  likelihood  estima¬ 
tion  and  show  the  results  obtained  from  the  R  software  in  Exhibit  7.8. 
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Exhibit  7.8  Maximum  Likelihood  Estimates  from  R  Software:  Hare 
Series 


Coefficients: 

arl 

ar2 

ar3 

Intercept^ 

1.0519 

-0.2292 

-0.3931 

5.6923 

s.e. 

0.1877 

0.2942 

0.1915 

0.3371 

sigmaA2  estimated  as  1.066:  log-likelihood  =  -46.54,  AIC  =  101.08 


'  The  intercept  here  is  the  estimate  of  the  process  mean  ]l — not  of  90. 


>  data (hare) 

>  arima ( sqrt (hare) , order =c (3,0,0)) 

Here  we  see  that  $j  =  1.0519,  $9  =  -0.2292,  and  $3  =  -0.3930.  We  also  see  that  the 
estimated  noise  variance  is  a}  =  1.066.  Noting  the  standard  errors,  the  estimates  of  the 
lag  1  and  lag  3  autoregressive  coefficients  are  significantly  different  from  zero,  as  is  the 
intercept  term,  but  the  lag  2  autoregressive  parameter  estimate  is  not  significant. 

The  estimated  model  would  be  written 

J¥t- 5.6923  =  1.0519(^/1^- 5.6923) -0.2292(^/7^- 5.6923) 

-  0.3930 5.6923)  +  et 

or 

J¥t  =  3.25  +  1.05197^-0.2292^-0.39307^  +  ^ 

where  Yt  is  the  hare  abundance  in  year  t  in  original  terms.  Since  the  lag  2  autoregressive 
term  is  insignificant,  we  might  drop  that  term  (that  is,  set  4>2  =  0)  and  obtain  new  esti¬ 
mates  of  4>j  and  4)3  with  this  subset  model. 

As  a  last  example,  we  return  to  the  oil  price  series.  The  sample  ACF  shown  in 
Exhibit  6.32  on  page  140,  suggested  an  MA(1)  model  on  the  differences  of  the  logs  of 
the  prices.  Exhibit  7.9  gives  the  estimates  of  9  by  the  various  methods  and,  as  we  have 
seen  earlier,  the  method-of-moments  estimate  differs  quite  a  bit  from  the  others.  The 
others  are  nearly  equal  given  their  standard  errors  of  about  0.07. 


Exhibit  7.9  Estimation  for  the  Difference  of  Logs  of  the  Oil  Price  Series 


Parameter 


Method-of- 

Moments 

Estimate 


Conditional 

SS 

Estimate 


Unconditional 

SS 

Estimate 


Maximum 

Likelihood 

Estimate 


n 


0 


-0.2225  -0.2731  -0.2954 


-0.2956  241 


>  data (oil .price) 

>  arima ( log (oil .price) , order =c (0,1,1)  ,  method= ' CSS '  ) 

>  arima ( log (oil .price) , order =c (0,1,1) , method= 1  ML ' ) 
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In  Section  7.4,  we  summarized  some  approximate  normal  distribution  results  for  the 
estimator  y ,  where  y  is  the  vector  consisting  of  all  the  ARMA  parameters.  These  normal 
approximations  are  accurate  for  large  samples,  and  statistical  software  generally  uses 
those  results  in  calculating  and  reporting  standard  errors.  The  standard  error  of  some 
complex  function  of  the  model  parameters,  for  example  the  quasi-period  of  the  model,  if 
it  exists,  is  then  usually  obtained  by  the  delta  method.  However,  the  general  theory  pro¬ 
vides  no  practical  guidance  on  how  large  the  sample  size  should  be  for  the  normal 
approximation  to  be  reliable.  Bootstrap  methods  (Efron  and  Tibshirani,  1993;  Davison 
and  Hinkley,  2003)  provide  an  alternative  approach  to  assessing  the  uncertainty  of  an 
estimator  and  may  be  more  accurate  for  small  samples.  There  are  several  variants  of  the 
bootstrap  method  for  dependent  data — see  Politis  (2003).  We  shall  confine  our  discus¬ 
sion  to  the  parametric  bootstrap  that  generates  the  bootstrap  time  series  ,  T2>  ■■■■> 
by  simulation  from  the  fitted  ARIMAf/?, (/,<:/)  model.  (The  bootstrap  may  be  done  by  fix- 
ing  the  first  p  +  cl  initial  values  of  Y  to  those  of  the  observed  data.  For  stationary  mod¬ 
els,  an  alternative  procedure  is  to  simulate  stationary  realizations  from  the  fitted  model, 
which  can  be  done  approximately  by  simulating  a  long  time  series  from  the  fitted  model 
and  then  deleting  the  transient  initial  segment  of  the  simulated  data — the  so-called 
burn-in.)  If  the  errors  are  assumed  to  be  normally  distributed,  the  errors  may  be  drawn 
randomly  and  with  replacement  from  N{ 0,  a?) .  For  the  case  of  an  unknown  error  distri¬ 
bution,  the  errors  can  be  drawn  randomly  and  with  replacement  from  the  residuals  of  the 
fitted  model.  For  each  bootstrap  series,  let  y  *  be  the  estimator  computed  based  on  the 
bootstrap  time  series  data  using  the  method  of  full  maximum  likelihood  estimation 
assuming  stationarity.  (Other  estimation  methods  may  be  used.)  The  bootstrap  is  repli¬ 
cated,  say,  B  times.  (For  example,  B  =  1000.)  From  the  B  bootstrap  parameter  estimates, 
we  can  form  an  empirical  distribution  and  use  it  to  calibrate  the  uncertainty  in  y .  Sup¬ 
pose  we  are  interested  in  estimating  some  function  of  y,  say  /z(y) — for  example,  the 
AR(1)  coefficient.  Using  the  percentile  method,  a  95%  bootstrap  confidence  interval  for 
/z(y)  can  be  obtained  as  the  interval  from  the  2.5  percentile  to  the  97.5  percentile  of  the 
bootstrap  distribution  of  h( y*) . 

We  illustrate  the  bootstrap  method  with  the  hare  data.  The  bootstrap  95%  confi¬ 
dence  intervals  reported  in  the  first  row  of  the  table  in  Exhibit  7.10  are  based  on  the 
bootstrap  obtained  by  conditioning  on  the  initial  three  observations  and  assuming  nor¬ 
mal  errors.  Those  in  the  second  row  are  obtained  using  the  same  method  except  that  the 
errors  are  drawn  from  the  residuals.  The  third  and  fourth  rows  report  the  confidence 
intervals  based  on  the  stationary  bootstrap  with  a  normal  error  distribution  for  the  third 
row  and  the  empirical  residual  distribution  for  the  fourth  row.  The  fifth  row  in  the  table 
shows  the  theoretical  95%  confidence  intervals  based  on  the  large-sample  distribution 
results  for  the  estimators.  In  particular,  the  bootstrap  time  series  for  the  first  bootstrap 
method  is  generated  recursively  using  the  equation 


[/-l 


-Wt-2-Wt 


-3 


+  e 


* 

t 


(7.6.1) 
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for  t  =  4,  5,...,  31,  where  thee*  are  chosen  independently  from  N(0,  Oe) ,  Y^  =  Y j, 
Y2  =  Y7  ,  Y3  =  Y3  ;  and  the  parameters  are  set  to  be  the  estimates  from  the  AR(3) 
model  fitted  to  the  (square  root  transformed)  hare  data  with  9()  =  p(  1  -  4> {  -  4>2  -  4>3) . 
All  results  are  based  on  about  1000  bootstrap  replications,  but  full  maximum  likelihood 
estimation  fails  for  6.3%,  6.3%,  3.8%,  and  4.8%  of  1000  cases  for  the  four  bootstrap 
methods  I,  II,  III,  and  IV,  respectively. 


Exhibit  7.1 0  Bootstrap  and  Theoretical  Confidence  Intervals  for  the  AR(3) 

Model  Fitted  to  the  Hare  Data 

Method 

arl 

ar2 

ar3 

intercept 

noise  var. 

I 

(0.593,  1.269) 

(-0.655,  0.237) 

(-0.666,-0.018) 

(5.115,6.394) 

(0.551,  1.546) 

II 

(0.612,  1.296) 

(-0.702,  0.243) 

(-0.669,  -0.026) 

(5.004,  6.324) 

(0.510,  1.510) 

III 

(0.699,  1.369) 

(-0.746,  0.195) 

(-0.666,  -0.021) 

(5.056,  6.379) 

(0.499,  1.515) 

IV 

(0.674,  1.389) 

(-0.769,  0.194) 

(-0.665,  -0.002) 

(4.995,6.312) 

(0.477,  1.530) 

Theoretical 

(0.684,  1.42) 

(-0.8058,0.3474)  ( 

-0.7684,-0.01776) 

(5.032,  6.353) 

(0.536,  1.597) 

>  See  the  Chapter 

7  R  scripts 

file  for  the 

extensive 

code 

required  to  generate  these  results . 


All  four  methods  yield  similar  bootstrap  confidence  intervals,  although  the  condi¬ 
tional  bootstrap  approach  generally  yields  slightly  narrower  confidence  intervals.  This  is 
expected,  as  the  conditional  bootstrap  time  series  bear  more  resemblance  to  each  other 
because  all  are  subject  to  identical  initial  conditions.  The  bootstrap  confidence  intervals 
are  generally  slightly  wider  than  their  theoretical  counterparts  that  are  derived  from  the 
large-sample  results.  Overall,  we  can  draw  the  inference  that  the  <|>2  coefficient  estimate 
is  insignificant,  whereas  both  the  (Jq  and  coefficient  estimates  are  significant  at  the 
5%  significance  level. 

The  bootstrap  method  has  the  advantage  of  allowing  easy  construction  of  confi¬ 
dence  intervals  for  a  model  characteristic  that  is  a  nonlinear  function  of  the  model 
parameters.  For  example,  the  characteristic  AR  polynomial  of  the  fitted  AR(3)  model 
for  the  hare  data  admits  a  pair  of  complex  roots.  Indeed,  the  roots  are  0.84  ±  0.647;  and 
-2.26,  where  ;'  =  .The  two  complex  roots  can  be  written  in  polar  form:  1.06exp(± 
0.657;).  As  in  the  discussion  of  the  quasi-period  for  the  AR(2)  model  on  page  74,  the 
quasi-period  of  the  fitted  AR(3)  model  can  be  defined  as  2ti/0.657  =  9.57.  Thus,  the  fit¬ 
ted  model  suggests  that  the  hare  abundance  underwent  cyclical  fluctuation  with  a  period 
of  about  9.57  years.  The  interesting  question  of  constructing  a  95%  confidence  interval 
for  the  quasi-period  could  be  studied  using  the  delta  method.  However,  this  will  be  quite 
complex,  as  the  quasi-period  is  a  complicated  function  of  the  parameters.  But  the  boot¬ 
strap  provides  a  simple  solution:  For  each  set  of  bootstrap  parameter  estimates,  we  can 
compute  the  quasi-period  and  hence  obtain  the  bootstrap  distribution  of  the 
quasi-period.  Confidence  intervals  for  the  quasi-period  can  then  be  constructed  using 
the  percentile  method,  and  the  shape  of  the  distribution  can  be  explored  via  the  histo¬ 
gram  of  the  bootstrap  quasi-period  estimates.  (Note  that  the  quasi-period  will  be  unde- 
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fined  whenever  the  roots  of  the  AR  characteristic  equation  are  all  real  numbers.)  Among 
the  1000  stationary  bootstrap  time  series  obtained  by  simulating  from  the  fitted  model 
with  the  errors  drawn  randomly  from  the  residuals  with  replacement,  952  series  lead  to 
successful  full  maximum  likelihood  estimation.  All  but  one  of  the  952  series  have 
well-defined  quasi-periods,  and  the  histogram  of  these  is  shown  in  Exhibit  7.11.  The 
histogram  shows  that  the  sampling  distribution  of  the  quasi-period  estimate  is  slightly 
skewed  to  the  right. '  The  Q-Q  normal  plot  (Exhibit  7.12)  suggests  that  the  quasi-period 
estimator  has,  furthermore,  a  thick-tailed  distribution.  Thus,  the  delta  method  and  the 
corresponding  normal  distribution  approximation  may  be  inappropriate  for  approximat¬ 
ing  the  sampling  distribution  of  the  quasi-period  estimator.  Finally,  using  the  percentile 
method,  a  95%  confidence  interval  of  the  quasi-period  is  found  to  be  (7.84,1 1.34). 


Exhibit  7.11  Histogram  of  Bootstrap  Quasi-period  Estimates 


6  8  10  12  14 


Quasi-period 

>  win . graph (width=3 . 9 , height =3 . 8 , point size =8 ) 

>  hist (period . replace , prob=T, xlab= ' Quasi -period ' , axes=F , 

xlim=c (5,16) ) 

>  axis (2 )  ;  axis (1,  c (4, 6, 8, 10, 12, 14, 16)  , c (4, 6, 8, 10, 12, 14, NA)  ) 


^  However,  see  the  discussion  below  Equation  (13.5.9)  on  page  338  where  it  is  argued  that, 
from  the  perspective  of  frequency  domain,  there  is  a  small  parametric  region  correspond¬ 
ing  to  complex  roots  and  yet  the  associated  quasi-period  may  not  be  physically  meaning¬ 
ful.  This  illustrates  the  subtlety  of  the  concept  of  quasi-period. 


170 


Parameter  Estimation 


Exhibit  7.12  Q-Q  Normal  Plot  of  Bootstrap  Quasi-period  Estimates 


Theoretical  Quantiles 


>  win . graph (width=2 . 5 , height=2 . 5 , pointsize=8 ) 

>  qqnorm (period. replace) ;  qqline (period . replace) 


7.7  Summary 


This  chapter  delved  into  the  estimation  of  the  parameters  of  ARIMA  models.  We  con¬ 
sidered  estimation  criteria  based  on  the  method  of  moments,  various  types  of  least 
squares,  and  maximizing  the  likelihood  function.  The  properties  of  the  various  estima¬ 
tors  were  given,  and  the  estimators  were  illustrated  both  with  simulated  and  actual  time 
series  data.  Bootstrapping  with  ARIMA  models  was  also  discussed  and  illustrated. 
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7.1  From  a  series  of  length  100,  we  have  computed  r\  =  0.8,  r2  =  0.5,  r3  =  0.4,  Y  =  2, 
and  a  sample  variance  of  5.  If  we  assume  that  an  AR(2)  model  with  a  constant 
term  is  appropriate,  how  can  we  get  (simple)  estimates  of  (j)^,  <|>2,  9q,  and  op  ? 

7.2  Assuming  that  the  following  data  arise  from  a  stationary  process,  calculate 
method-of-moments  estimates  of  p,  y0,  and  pp  6,  5,  4,  6,  4. 

7.3  If  {Yt}  satisfies  an  AR(1)  model  with  (j)  of  about  0.7,  how  long  of  a  series  do  we 
need  to  estimate  <j>  =  p  j  with  95%  confidence  that  our  estimation  error  is  no  more 
than  ±0.1? 

7.4  Consider  an  MA(1)  process  for  which  it  is  known  that  the  process  mean  is  zero. 
Based  on  a  series  of  length  n  =  3,  we  observe  Tj  =0,  Y2  =  -1,  and  T3  =  Vi. 

(a)  Show  that  the  conditional  least-squares  estimate  of  9  is  Vi. 

(b)  Find  an  estimate  of  the  noise  variance.  (Hint:  Iterative  methods  are  not  needed 
in  this  simple  case.) 
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7.5  Given  the  data  F]  =  10,  F2  =  9,  and  Y2  =  9.5,  we  wish  to  fit  an  IMA(1,1)  model 

without  a  constant  term. 

(a)  Find  the  conditional  least  squares  estimate  of  0.  (Hint:  Do  Exercise  7.4  first.) 

(b)  Estimate  a} . 

7.6  Consider  two  different  parameterizations  of  the  AR(1)  process  with  nonzero 

mean: 

Model  I.  Y,  -  p  =  (KF,-!  -  p)  +  et. 

Model  II.  Yt  =  4>F,_i  +  0O  +  et. 

We  want  to  estimate  4>  and  p  or  <ji  and  0O  using  conditional  least  squares  conditional 
on  Fj.  Show  that  with  Model  I  we  are  led  to  solve  nonlinear  equations  to  obtain  the 
estimates,  while  with  Model  II  we  need  only  solve  linear  equations. 

7.7  Verify  Equation  (7.1.4)  on  page  150. 

7.8  Consider  an  ARMA(l.l)  model  with  4>  =  0.5  and  0  =  0.45. 

(a)  For  n  =  48,  evaluate  the  variances  and  correlation  of  the  maximum  likelihood 
estimators  of  4>  and  0  using  Equations  (7.4.13)  on  page  161.  Comment  on  the 
results. 

(b)  Repeat  part  (a)  but  now  with  n  =  120.  Comment  on  the  new  results. 

7.9  Simulate  an  MA(1)  series  with  0  =  0.8  and  n  =  48. 

(a)  Find  the  method-of-moments  estimate  of  0. 

(b)  Find  the  conditional  least  squares  estimate  of  0  and  compare  it  with  part  (a). 

(c)  Find  the  maximum  likelihood  estimate  of  0  and  compare  it  with  parts  (a)  and 
(b). 

(d)  Repeat  parts  (a),  (b),  and  (c)  with  a  new  simulated  series  using  the  same 
parameters  and  same  sample  size.  Compare  your  results  with  your  results 
from  the  first  simulation. 

7.10  Simulate  an  MA(1)  series  with  0  =  -0.6  and  n  =  36. 

(a)  Find  the  method-of-moments  estimate  of  0. 

(b)  Find  the  conditional  least  squares  estimate  of  0  and  compare  it  with  part  (a). 

(c)  Find  the  maximum  likelihood  estimate  of  0  and  compare  it  with  parts  (a)  and 
(b). 

(d)  Repeat  parts  (a),  (b),  and  (c)  with  a  new  simulated  series  using  the  same 
parameters  and  same  sample  size.  Compare  your  results  with  your  results 
from  the  first  simulation. 

7.11  Simulate  an  MA(1)  series  with  0  =  -0.6  and  n  =  48. 

(a)  Find  the  maximum  likelihood  estimate  of  0. 

(b)  If  your  software  permits,  repeat  part  (a)  many  times  with  a  new  simulated 
series  using  the  same  parameters  and  same  sample  size. 

(c)  Form  the  sampling  distribution  of  the  maximum  likelihood  estimates  of  0. 

(d)  Are  the  estimates  (approximately)  unbiased? 

(e)  Calculate  the  variance  of  your  sampling  distribution  and  compare  it  with  the 
large-sample  result  in  Equation  (7.4.1 1)  on  page  161. 

7.12  Repeat  Exercise  7.11  using  a  sample  size  of  n  =  120. 
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7.13  Simulate  an  AR(1)  series  with  (j)  =  0.8  and  n  =  48. 

(a)  Find  the  method-of-moments  estimate  of  4>. 

(b)  Find  the  conditional  least  squares  estimate  of  4>  and  compare  it  with  part  (a). 

(c)  Find  the  maximum  likelihood  estimate  of  (|)  and  compare  it  with  parts  (a)  and 

(b). 

(d)  Repeat  parts  (a),  (b),  and  (c)  with  a  new  simulated  series  using  the  same 
parameters  and  same  sample  size.  Compare  your  results  with  your  results 
from  the  first  simulation. 

7.14  Simulate  an  AR(1)  series  with  (J)  =  -0.5  and  n  =  60. 

(a)  Find  the  method-of-moments  estimate  of  4>. 

(b)  Find  the  conditional  least  squares  estimate  of  4>  and  compare  it  with  part  (a). 

(c)  Find  the  maximum  likelihood  estimate  of  (|)  and  compare  it  with  parts  (a)  and 

(b). 

(d)  Repeat  parts  (a),  (b),  and  (c)  with  a  new  simulated  series  using  the  same 
parameters  and  same  sample  size.  Compare  your  results  with  your  results 
from  the  first  simulation. 

7.15  Simulate  an  AR(1)  series  with  (J)  =  0.7  and  n  =  100. 

(a)  Find  the  maximum  likelihood  estimate  of  4). 

(b)  If  your  software  permits,  repeat  part  (a)  many  times  with  a  new  simulated 
series  using  the  same  parameters  and  same  sample  size. 

(c)  Form  the  sampling  distribution  of  the  maximum  likelihood  estimates  of  (|). 

(d)  Are  the  estimates  (approximately)  unbiased? 

(e)  Calculate  the  variance  of  your  sampling  distribution  and  compare  it  with  the 
large-sample  result  in  Equation  (7.4.9)  on  page  161. 

7.16  Simulate  an  AR(2)  series  with  (jq  =  0.6,  (|)2  =  0.3,  and  n  =  60. 

(a)  Find  the  method-of-moments  estimates  of  <j)  |  and  <j)2. 

(b)  Find  the  conditional  least  squares  estimates  of  ()>i  and  <|)2  and  compare  them 
with  part  (a). 

(c)  Find  the  maximum  likelihood  estimates  of  4q  and  <j)2  and  compare  them  with 
parts  (a)  and  (b). 

(d)  Repeat  parts  (a),  (b),  and  (c)  with  a  new  simulated  series  using  the  same 
parameters  and  same  sample  size.  Compare  these  results  to  your  results  from 
the  first  simulation. 

7.17  Simulate  an  ARMA(1,1)  series  with  (J>  =  0.7,  0  =  0.4,  and  n  =  72. 

(a)  Find  the  method-of-moments  estimates  of  4>  and  0. 

(b)  Find  the  conditional  least  squares  estimates  of  <))  and  0  and  compare  them  with 
part  (a). 

(c)  Find  the  maximum  likelihood  estimates  of  4>  and  0  and  compare  them  with 
parts  (a)  and  (b). 

(d)  Repeat  parts  (a),  (b),  and  (c)  with  a  new  simulated  series  using  the  same 
parameters  and  same  sample  size.  Compare  your  new  results  with  your  results 
from  the  first  simulation. 

7.18  Simulate  an  AR(1)  series  with  (J)  =  0.6,  n  =  36  but  with  error  terms  from  a  r-distri- 

bution  with  3  degrees  of  freedom. 
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(a)  Display  the  sample  PACF  of  the  series.  Is  an  AR(1)  model  suggested? 

(b)  Estimate  4)  from  the  series  and  comment  on  the  results. 

(c)  Repeat  parts  (a)  and  (b)  with  a  new  simulated  series  under  the  same  condi¬ 
tions. 

7.19  Simulate  an  MA(1)  series  with  0  =  -0.8,  n  =  60  but  with  error  terms  from  a  /-dis¬ 
tribution  with  4  degrees  of  freedom. 

(a)  Display  the  sample  ACF  of  the  series.  Is  an  MA(1)  model  suggested? 

(b)  Estimate  0  from  the  series  and  comment  on  the  results. 

(c)  Repeat  parts  (a)  and  (b)  with  a  new  simulated  series  under  the  same  condi¬ 
tions. 

7.20  Simulate  an  AR(2)  series  with  <j)  j  =  1.0,  (jq  ~  _0.6,  n  =  48  but  with  error  terms 
from  a  /-distribution  with  5  degrees  of  freedom. 

(a)  Display  the  sample  PACF  of  the  series.  Is  an  AR(2)  model  suggested? 

(b)  Estimate  (j) j  and  f|>2  from  the  series  and  comment  on  the  results. 

(c)  Repeat  parts  (a)  and  (b)  with  a  new  simulated  series  under  the  same  condi¬ 
tions. 

7.21  Simulate  an  ARMA(l.l)  series  with  (J)  =  0.7,  0  =  -0.6,  n  =  48  but  with  error  terms 
from  a  /-distribution  with  6  degrees  of  freedom. 

(a)  Display  the  sample  EACF  of  the  series.  Is  an  ARMA(l.l)  model  suggested? 

(b)  Estimate  4>  and  0  from  the  series  and  comment  on  the  results. 

(c)  Repeat  parts  (a)  and  (b)  with  a  new  simulated  series  under  the  same  condi¬ 
tions. 

7.22  Simulate  an  AR(1)  series  with  cj)  =  0.6,  n  =  36  but  with  error  terms  from  a 
chi-square  distribution  with  6  degrees  of  freedom. 

(a)  Display  the  sample  PACF  of  the  series.  Is  an  AR(1)  model  suggested? 

(b)  Estimate  4>  from  the  series  and  comment  on  the  results. 

(c)  Repeat  parts  (a)  and  (b)  with  a  new  simulated  series  under  the  same  condi¬ 
tions. 

7.23  Simulate  an  MA(1)  series  with  0  =  -0.8,  n  =  60  but  with  error  terms  from  a 
chi-square  distribution  with  7  degrees  of  freedom. 

(a)  Display  the  sample  ACF  of  the  series.  Is  an  MA(1)  model  suggested? 

(b)  Estimate  0  from  the  series  and  comment  on  the  results. 

(c)  Repeat  parts  (a)  and  (b)  with  a  new  simulated  series  under  the  same  condi¬ 
tions. 

7.24  Simulate  an  AR(2)  series  with  4>  j  =  1.0,  4>2  =  -0.6,  n  =  48  but  with  error  terms 
from  a  chi-square  distribution  with  8  degrees  of  freedom. 

(a)  Display  the  sample  PACF  of  the  series.  Is  an  AR(2)  model  suggested? 

(b)  Estimate  (fq  and  (Jh  from  the  series  and  comment  on  the  results. 

(c)  Repeat  parts  (a)  and  (b)  with  a  new  simulated  series  under  the  same  condi¬ 
tions. 

7.25  Simulate  an  ARMA(l.l)  series  with  c|)  =  0.7,  0  =  -0.6,  n  =  48  but  with  error  terms 
from  a  chi-square  distribution  with  9  degrees  of  freedom. 

(a)  Display  the  sample  EACF  of  the  series.  Is  an  ARMA(l.l)  model  suggested? 

(b)  Estimate  4>  and  0  from  the  series  and  comment  on  the  results. 

(c)  Repeat  parts  (a)  and  (b)  with  a  new  series  under  the  same  conditions. 
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7.26  Consider  the  AR(1)  model  specified  for  the  color  property  time  series  displayed 
in  Exhibit  1.3  on  page  3.  The  data  are  in  the  file  named  color. 

(a)  Find  the  method-of-moments  estimate  of  4>. 

(b)  Find  the  maximum  likelihood  estimate  of  4)  and  compare  it  with  part  (a). 

7.27  Exhibit  6.31  on  page  139  suggested  specifying  either  an  AR(1)  or  possibly  an 
AR(4)  model  for  the  difference  of  the  logarithms  of  the  oil  price  series.  The  data 
are  in  the  file  named  oil. price. 

(a)  Estimate  both  of  these  models  using  maximum  likelihood  and  compare  it  with 
the  results  using  the  AIC  criteria. 

(b)  Exhibit  6.32  on  page  140  suggested  specifying  an  MA(1)  model  for  the  differ¬ 
ence  of  the  logs.  Estimate  this  model  by  maximum  likelihood  and  compare  to 
your  results  in  part  (a). 

7.28  The  data  file  named  deere3  contains  57  consecutive  values  from  a  complex 
machine  tool  at  Deere  &  Co.  The  values  given  are  deviations  from  a  target  value 
in  units  of  ten  millionths  of  an  inch.  The  process  employs  a  control  mechanism 
that  resets  some  of  the  parameters  of  the  machine  tool  depending  on  the  magni¬ 
tude  of  deviation  from  target  of  the  last  item  produced. 

(a)  Estimate  the  parameters  of  an  AR(1)  model  for  this  series. 

(b)  Estimate  the  parameters  of  an  AR(2)  model  for  this  series  and  compare  the 
results  with  those  in  part  (a). 

7.29  The  data  file  named  robot  contains  a  time  series  obtained  from  an  industrial  robot. 
The  robot  was  put  through  a  sequence  of  maneuvers,  and  the  distance  from  a 
desired  ending  point  was  recorded  in  inches.  This  was  repeated  324  times  to  form 
the  time  series. 

(a)  Estimate  the  parameters  of  an  AR(1)  model  for  these  data. 

(b)  Estimate  the  parameters  of  an  IMA(l.l)  model  for  these  data. 

(c)  Compare  the  results  from  parts  (a)  and  (b)  in  terms  of  AIC. 

7.30  The  data  file  named  days  contains  accounting  data  from  the  Winegard  Co.  of  Bur¬ 
lington,  Iowa.  The  data  are  the  number  of  days  until  Winegard  receives  payment 
for  130  consecutive  orders  from  a  particular  distributor  of  Winegard  products. 
(The  name  of  the  distributor  must  remain  anonymous  for  confidentiality  reasons.) 
The  time  series  contains  outliers  that  are  quite  obvious  in  the  time  series  plot. 

(a)  Replace  each  of  the  unusual  values  with  a  value  of  35  days,  a  much  more  typ¬ 
ical  value,  and  then  estimate  the  parameters  of  an  MA(2)  model. 

(b)  Now  assume  an  MA(5)  model  and  estimate  the  parameters.  Compare  these 
results  with  those  obtained  in  part  (a). 

7.31  Simulate  a  time  series  of  length  n  =  48  from  an  AR(1)  model  with  4>  =  0.7.  Use 
that  series  as  if  it  were  real  data.  Now  compare  the  theoretical  asymptotic  distri¬ 
bution  of  the  estimator  of  <j)  with  the  distribution  of  the  bootstrap  estimator  of  4>. 

7.32  The  industrial  color  property  time  series  was  fitted  quite  well  by  an  AR(1)  model. 
However,  the  series  is  rather  short,  with  /;  =  35.  Compare  the  theoretical  asymp¬ 
totic  distribution  of  the  estimator  of  <j>  with  the  distribution  of  the  bootstrap  esti¬ 
mator  of  f  The  data  are  in  the  file  named  color. 
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We  have  now  discussed  methods  for  specifying  models  and  for  efficiently  estimating  the 
parameters  in  those  models.  Model  diagnostics,  or  model  criticism,  is  concerned  with 
testing  the  goodness  of  fit  of  a  model  and,  if  the  fit  is  poor,  suggesting  appropriate  mod¬ 
ifications.  We  shall  present  two  complementary  approaches:  analysis  of  residuals  from 
the  fitted  model  and  analysis  of  overparameterized  models;  that  is,  models  that  are  more 
general  than  the  proposed  model  but  that  contain  the  proposed  model  as  a  special  case. 

8.1  Residual  Analysis 


We  already  used  the  basic  ideas  of  residual  analysis  in  Section  3.6  on  page  42  when  we 
checked  the  adequacy  of  fitted  deterministic  trend  models.  With  autoregressive  models, 
residuals  are  defined  in  direct  analogy  to  that  earlier  work.  Consider  in  particular  an 
AR(2)  model  with  a  constant  term: 

Y,  =  ^Yt-l+^2Y,-2  +  %  +  ^,  (8.1.1) 

Having  estimated  4>i,  <t>2’  and  9o>  the  residuals  are  defined  as 

2,=  Yt-$lY,-l-$2Y,-2-&0  (8'L2) 

For  general  ARMA  models  containing  moving  average  terms,  we  use  the  inverted, 
infinite  autoregressive  form  of  the  model  to  define  residuals.  For  simplicity,  we  assume 
that  9()  is  zero.  From  the  inverted  form  of  the  model.  Equation  (4.5.5)  on  page  80,  we 
have 


Yt  =  nlY,-l  +  7l2Yt-2  +  n3Y,-3+---+et 


so  that  the  residuals  are  defined  as 


A  -WT  A  -WT  A  -WT  A 

et  =  Yt~nlYt- l~n2Y,-2-n3Y,-3- 


(8.1.3) 


Here  the  it’s  are  not  estimated  directly  but  rather  implicitly  as  functions  of  the  4>’s  and 
0’s.  In  fact,  the  residuals  are  not  calculated  using  this  equation  but  as  a  by-product  of  the 
estimation  of  the  <])’s  and  0’s.  In  Chapter  9,  we  shall  argue,  that 


TZ  A  T7  A  T7  A  T7 

Yt  =  K\Yt-\  +  K2Yt-2  +  n3Y,-3  + 
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is  the  best  forecast  of  Yt  based  on  Yt_  \ ,  Yt  _  2,  Yt_  3,...  .  Thus  Equation  (8.1.3)  can  be 
rewritten  as 

residual  =  actual  -  predicted 

in  direct  analogy  with  regression  models.  Compare  this  with  Section  3.6  on  page  42. 

If  the  model  is  correctly  specified  and  the  parameter  estimates  are  reasonably  close 
to  the  true  values,  then  the  residuals  should  have  nearly  the  properties  of  white  noise. 
They  should  behave  roughly  like  independent,  identically  distributed  normal  variables 
with  zero  means  and  common  standard  deviations.  Deviations  from  these  properties  can 
help  us  discover  a  more  appropriate  model. 

Plots  of  the  Residuals 

Our  first  diagnostic  check  is  to  inspect  a  plot  of  the  residuals  over  time.  If  the  model  is 
adequate,  we  expect  the  plot  to  suggest  a  rectangular  scatter  around  a  zero  horizontal 
level  with  no  trends  whatsoever. 

Exhibit  8.1  shows  such  a  plot  for  the  standardized  residuals  from  the  AR(1)  model 
fitted  to  the  industrial  color  property  series.  Standardization  allows  us  to  see  residuals  of 
unusual  size  much  more  easily.  The  parameters  were  estimated  using  maximum  likeli¬ 
hood.  This  plot  supports  the  model,  as  no  trends  are  present. 


Exhibit  8.1  Standardized  Residuals  from  AR(1)  Model  of  Color 


>  win . graph (width=4 . 875 , height =3 , pointsize=8 ) 

>  data (color) 

>  ml . color=arima (color , order=c ( 1 , 0 , 0 )) ;  ml. color 

>  plot (rstandard (ml . color)  , ylab  = 1  Standardized  Residuals', 

type= '0');  abline(h=0) 


As  a  second  example,  we  consider  the  Canadian  hare  abundance  series.  We  esti¬ 
mate  a  subset  AR(3)  model  with  ())-,  set  to  zero,  as  suggested  by  the  discussion  following 
Exhibit  7.8  on  page  166.  The  estimated  model  is 
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JY,  =  3AB3  + 0.919  jY^l- 0.5313  jY^3  +  et  (8.1.4) 

and  the  time  series  plot  of  the  standardized  residuals  from  this  model  is  shown  in 
Exhibit  8.2.  Here  we  see  possible  reduced  variation  in  the  middle  of  the  series  and 
increased  variation  near  the  end  of  the  series — not  exactly  an  ideal  plot  of  residuals.  * 


Exhibit  8.2  Standardized  Residuals  from  AR(3)  Model  for  Sqrt(Hare) 


1905  1910  1915  1920  1925  1930  1935 

Time 

>  data (hare) 

>  ml . hare=arima ( sqrt (hare) , order=c (3 , 0 , 0 ) ) ;  ml. hare 

>  m2 . hare=arima ( sqrt (hare) , order =c (3,0,0) , f ixed=c (NA, 0 , NA, NA) ) 

>  m2 . hare 

>  #  Note  that  the  intercept  term  given  in  R  is  actually  the  mean 

in  the  centered  form  of  the  ARMA  model;  that  is,  if 
y (t ) =sqrt (hare) - intercept ,  then  the  model  is 
y (t)  =  0 . 919 *y (t-1) -0 . 5313 *y (t-3) +e (t) 

>  #  So  the  'true'  intercept  equals  5 . 6889* ( 1- 0 . 919+0 . 53 13 ) =3 . 483 

>  plot (rstandard (m2 . hare) , ylab= ' Standardized  Residuals ' , type= 'o') 

>  abline(h=0) 


Exhibit  8.3  displays  the  time  series  plot  of  the  standardized  residuals  from  the 
IMA(1,1)  model  estimated  for  the  logarithms  of  the  oil  price  time  series.  The  model  was 
fitted  using  maximum  likelihood  estimation.  There  are  at  least  two  or  three  residuals 
early  in  the  series  with  magnitudes  larger  than  3 — very  unusual  in  a  standard  normal 
distributions  Ideally,  we  should  go  back  to  those  months  and  try  to  learn  what  outside 
factors  may  have  influenced  unusually  large  drops  or  unusually  large  increases  in  the 
price  of  oil. 


+  The  seemingly  large  negative  standardized  residuals  are  not  outliers  according  to  the  Bon- 
ferroni  outlier  criterion  with  critical  values  ±3.15. 

+  The  Bonferroni  critical  values  with  n  =  241  and  a  =  0.05  are  ±3.71,  so  the  outliers  do 
appear  to  be  real.  We  will  model  them  in  Chapter  1 1 . 
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Exhibit  8.3  Standardized  Residuals  from  Log  Oil  Price  IMA(1,1)  Model 


1990  1995  2000  2005 


Time 

>  data (oil .price) 

>  ml . oil=arima (log (oil .price) , order=c (0,1,1)) 

>  plot (r standard (ml . oil ) , ylab= ' Standardized  residuals ' , type= ' 1 ' ) 

>  abline (h=0) 


Normality  of  the  Residuals 

As  we  saw  in  Chapter  3,  quantile-quantile  plots  are  an  effective  tool  for  assessing  nor¬ 
mality.  Here  we  apply  them  to  residuals. 

A  quantile-quantile  plot  of  the  residuals  from  the  AR(1)  model  estimated  for  the 
industrial  color  property  series  is  shown  in  Exhibit  8.4.  The  points  seem  to  follow  the 
straight  line  fairly  closely — especially  the  extreme  values.  This  graph  would  not  lead  us 
to  reject  normality  of  the  error  terms  in  this  model.  In  addition,  the  Shapiro- Wilk  nor¬ 
mality  test  applied  to  the  residuals  produces  a  test  statistic  of  W  =  0.9754,  which  corre¬ 
sponds  to  a  p- value  of  0.6057,  and  we  would  not  reject  normality  based  on  this  test. 


8.1  Residual  Analysis 
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Exhibit  8.4  Quantile-Quantile  Plot:  Residuals  from  AR(1)  Color  Model 


Theoretical  Quantiles 

>  win . graph (width=2 . 5 , height =2 . 5 , point size =8 ) 

>  qqnorm (residuals (ml . color) ) ;  qqline (residuals (ml . color) ) 


The  quantile-quantile  plot  for  the  residuals  from  the  AR(3)  model  for  the  square 
root  of  the  hare  abundance  time  series  is  displayed  in  Exhibit  8.5.  Here  the  extreme  val¬ 
ues  look  suspect.  However,  the  sample  is  small  (n  =  31)  and,  as  stated  earlier,  the  Bon- 
ferroni  criteria  for  outliers  do  not  indicate  cause  for  alarm. 


Exhibit  8.5  Quantile-Quantile  Plot:  Residuals  from  AR(3)  for  Hare 


Theoretical  Quantiles 

>  qqnorm (residuals (ml . hare) ) ;  qqline (residuals (ml . hare) ) 


Exhibit  8.6  gives  the  quantile-quantile  plot  for  the  residuals  from  the  IMA(1,1) 
model  that  was  used  to  model  the  logarithms  of  the  oil  price  series.  Here  the  outliers  are 
quite  prominent,  and  we  will  deal  with  them  in  Chapter  1 1 . 
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Exhibit  8.6  Quantile-Quantile  Plot:  Residuals  from  IMA(1,1)  Model  for 
Oil 


Theoretical  Quantiles 


>  qqnorm (residuals (ml . oil) ) ;  qqline (residuals (ml . oil ) ) 


Autocorrelation  of  the  Residuals 

To  check  on  the  independence  of  the  noise  terms  in  the  model,  we  consider  the  sample 
autocorrelation  function  of  the  residuals,  denoted  r  k .  From  Equation  (6.1.3)  on 
page  1 10,  we  know  that  for  true  white  noise  and  large  n,  the  sample  autocorrelations  are 
approximately  uncorrelated  and  normally  distributed  with  zero  means  and  variance  1  In. 
Unfortunately,  even  residuals  from  a  correctly  specified  model  with  efficiently  esti¬ 
mated  parameters  have  somewhat  different  properties.  This  was  first  explored  for  multi¬ 
ple-  regression  models  in  a  series  of  papers  by  Durbin  and  Watson  (1950,  1951,  1971) 
and  for  autoregressive  models  in  Durbin  (1970).  The  key  reference  on  the  distribution  of 
residual  autocorrelations  in  ARIMA  models  is  Box  and  Pierce  (1970),  the  results  of 
which  were  generalized  in  McLeod  (1978). 

Generally  speaking,  the  residuals  are  approximately  normally  distributed  with  zero 
means;  however,  for  small  lags  k  and  j,  the  variance  of  r^can  be  substantially  less  than 
1  In  and  the  estimates  rk  and  rj  can  be  highly  correlated.  For  larger  lags,  the  approxi¬ 
mate  variance  1/n  does  apply,  and  further  rk  and  rj  are  approximately  uncorrelated. 

As  an  example  of  these  results,  consider  a  correctly  specified  and  efficiently  esti¬ 
mated  AR(  1 )  model.  It  can  be  shown  that,  for  large  n, 

Var(r.)«^  (8.1.5) 

1  n 


Vcir(rk) 


1  -  ( 1  -  (|)-)(|)2^  2 
n 


for  k  >  1 


(8.1.6) 


Corr(rvrk)« -signify)  2  for  k>  1 

1  -  ( 1  -  z 


(8.1.7) 


8.1  Residual  Analysis 
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where 


signify)  = 


1 

<  0 

-1 


if  ()>  >  0 
if  fy  =  0 
if  ())  <  0 


The  table  in  Exhibit  8.7  illustrates  these  formulas  for  a  variety  of  values  of  ())  and  k. 
Notice  that  Var(r  j)  *  l/«  is  a  reasonable  approximation  for  k  >  2  over  a  wide  range  of 
4>-values. 


Exhibit  8.7 

4>  0.3 

Approximations  for  Residual  Autocorrelations  in  AR(1) 
Models 

0.5  0.7  0.9  fy  0.3  0.5  0.7  0.9 

k 

Standard  deviation  of 

A 

rk 

Correlation 

A  ...  A 

r  i  with  rv 

times  Jn 

l 

0.30 

0.50 

0.70 

0.90 

1.00 

1.00 

1.00 

1.00 

2 

0.96 

0.90 

0.87 

0.92 

-0.95 

-0.83 

-0.59 

-0.21 

3 

1.00 

0.98 

0.94 

0.94 

-0.27 

-0.38 

-0.38 

-0.18 

4 

1.00 

0.99 

0.97 

0.95 

-0.08 

-0.19 

-0.26 

-0.16 

5 

1.00 

1.00 

0.99 

0.96 

-0.02 

-0.09 

-0.18 

-0.14 

6 

1.00 

1.00 

0.99 

0.97 

-0.01 

-0.05 

-0.12 

-0.13 

7 

1.00 

1.00 

1.00 

0.97 

-0.00 

-0.02 

-0.09 

-0.12 

8 

1.00 

1.00 

1.00 

0.98 

-0.00 

-0.01 

-0.06 

-0.10 

9 

1.00 

1.00 

1.00 

0.99 

-0.00 

-0.00 

-0.03 

-0.08 

If  we  apply  these  results  to  the  AR(1)  model  that  was  estimated  for  the  industrial 
color  property  time  series  with  4>  =  0-57  and  n  =  35,  we  obtain  the  results  shown  in 
Exhibit  8.8. 


Exhibit  8.8 


Lag  k 

JVaWk) 


Approximate  Standard  Deviations  of  Residual  ACF  values 

1  2  3  4  5  >5 

0.096  0.149  0.163  0.167  0.168  0.169 


A  graph  of  the  sample  ACF  of  these  residuals  is  shown  in  Exhibit  8.9.  The  dashed 
horizontal  lines  plotted  are  based  on  the  large  lag  standard  error  of  +  2/Jn.  There  is  no 
evidence  of  autocorrelation  in  the  residuals  of  this  model. 
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Exhibit  8.9  Sample  ACF  of  Residuals  from  AR(1)  Model  for  Color 


2  4  6  8  10  12  14 


Lag 

>  win . graph (width=4 . 875 , height =3 , pointsize=8 ) 

>  acf (residuals (ml . color)  ) 


For  an  AR(2)  model,  it  can  be  shown  that 


Var(r\)  « 


n 


(8.1.8) 


and 


T  j  /A  \  ^  +  ^?(l+t^2)2 

Varir 0) « - 


(8.1.9) 


If  the  AR(2)  parameters  are  not  too  close  to  the  stationarity  boundary  shown  in  Exhibit 
4.17  on  page  72,  then 

Var(Pk)  ~  -  for  k>3  (8.1.10) 

n 

If  we  fit  an  AR(2)  model*  by  maximum  likelihood  to  the  square  root  of  the  hare 
abundance  series,  we  find  that  4>  j  =  1-351  and  4>0  =  -0.776.  Thus  we  have 

JVar(r  ])  ~  I-0-776!  =  0.131 
V  V35 

,  Jt!L ZZ6)2  +  ( L353y2( 1  +  =  0.141 

JVar(rk )  «  1/735  =  0.169  for  k  >  3 


1  The  AR(2)  model  is  not  quite  as  good  as  the  AR(3)  model  that  we  estimated  earlier,  but  it 
still  fits  quite  well  and  serves  as  a  reasonable  example  here. 


8.1  Residual  Analysis 
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Exhibit  8.10  displays  the  sample  ACF  of  the  residuals  from  the  AR(2)  model  of  the 
square  root  of  the  hare  abundance.  The  lag  1  autocorrelation  here  equals  -0.261,  which 
is  close  to  2  standard  errors  below  zero  but  not  quite.  The  lag  4  autocorrelation  equals 
-0.318,  but  its  standard  error  is  0.169.  We  conclude  that  the  graph  does  not  show  statis¬ 
tically  significant  evidence  of  nonzero  autocorrelation  in  the  residuals.^ 


Exhibit  8.10  Sample  ACF  of  Residuals  from  AR(2)  Model  for  Hare 


2  4  6  8  10  12  14 


Lag 

>  acf (residuals (arima (sqrt (hare) , order =c (2,0,0)))) 


With  monthly  data,  we  would  pay  special  attention  to  possible  excessive  autocorre¬ 
lation  in  the  residuals  at  lags  12,  24,  and  so  forth.  With  quarterly  series,  lags  4,  8,  and  so 
forth  would  merit  special  attention.  Chapter  10  contains  examples  of  these  ideas. 

It  can  be  shown  that  results  analogous  to  those  for  AR  models  hold  for  MA  models. 
In  particular,  replacing  c)>  by  0  in  Equations  (8.1.5),  (8.1.6),  and(  8.1.7)  gives  the  results 
for  the  MA(1)  case.  Similarly,  results  for  the  MA(2)  case  can  be  stated  by  replacing  (Jq 
and  <[>2  by  9j  and  02,  respectively,  in  Equations  (8.1.8),  (8.1.9),  and  (8.1.10).  Results  for 
general  ARMA  models  may  be  found  in  Box  and  Pierce  (1970)  and  McLeod  (1978). 


The  Ljung-Box  Test 


In  addition  to  looking  at  residual  correlations  at  individual  lags,  it  is  useful  to  have  a  test 
that  takes  into  account  their  magnitudes  as  a  group.  For  example,  it  may  be  that  most  of 
the  residual  autocorrelations  are  moderate,  some  even  close  to  their  critical  values,  but, 
taken  together,  they  seem  excessive.  Box  and  Pierce  (1970)  proposed  the  statistic 


Q 


(8.1.11) 


to  address  this  possibility.  They  showed  that  if  the  correct  ARMA(p,<7)  model  is  esti¬ 
mated,  then,  for  large  n,  Q  has  an  approximate  chi-square  distribution  with  K  -  p  -  q 


^  Recall  that  an  AR(3)  model  fits  these  data  even  better  and  has  even  less  autocorrelation  in 
its  residuals,  see  Exercise  8.7. 
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degrees  of  freedom.  Fitting  an  erroneous  model  would  tend  to  inflate  Q.  Thus,  a  general 
"portmanteau”  test  would  reject  the  ARMA(p,g)  model  if  the  observed  value  of  Q 
exceeded  an  appropriate  critical  value  in  a  chi-square  distribution  with  K-p  -  q  degrees 
of  freedom.  (Here  the  maximum  lag  K  is  selected  somewhat  arbitrarily  but  large  enough 
that  the  \|/-weights  are  negligible  for  j  >  K.) 

The  chi-square  distribution  for  Q  is  based  on  a  limit  theorem  as  n  — >  oo ,  but  Ljung 
and  Box  (1978)  subsequently  discovered  that  even  for  n  =  100,  the  approximation  is  not 
satisfactory.  By  modifying  the  Q  statistic  slightly,  they  defined  a  test  statistic  whose  null 
distribution  is  much  closer  to  chi-square  for  typical  sample  sizes.  The  modified 
Box-Pierce,  or  Ljung-Box,  statistic  is  given  by 

£*  =  n(n  +  2)(-^t  +  -^2  +  -+-^)  (8.1.12) 

Notice  that  since  (n  +  2 )/(«  -  k)  >  1  for  every  k  >  1,  we  have  Q *  >  Q,  which  partly 
explains  why  the  original  statistic  Q  tended  to  overlook  inadequate  models.  More  details 
on  the  exact  distributions  of  Q*  and  Q  for  finite  samples  can  be  found  in  Ljung  and  Box 
(1978),  see  also  Davies,  Triggs,  and  Newbold  (1977). 

Exhibit  8.1 1  lists  the  first  six  autocorrelations  of  the  residuals  from  the  AR(1)  fitted 
model  for  the  color  property  series.  Here  n  =  35. 


Exhibit  8.1 1  Residual  Autocorrelation  Values  from  AR(1 )  Model  for  Color 

Lag  k  1  2  3  4  5  6 

Residual  ACF  -0.051  0.032  0.047  0.021  -0.017  -0.019 

>  acf (residuals (ml . color) , plot=F) $acf 

>  signif (acf (residuals (ml . color) , plot=F) $acf [1 : 6] ,2) 

>  #  display  the  first  6  acf  values  to  2  significant  digits 


The  Ljung-Box  test  statistic  with  K  =  6  is  equal  to 


Q* 


35(35  +  2)f(-°-051)2  +  (°-032)2  +  (°-047>2 
35  -  1  35-2  35-3 


t  (0.021  )2  ^  (-0.017)2  t 
35  -4  35-5 


(~0.019)2^ 
35-6  ' 


«  0.28 


This  is  referred  to  a  chi-square  distribution  with  6-1=5  degrees  of  freedom.  This  leads 
to  a  /;- value  of  0.998,  so  we  have  no  evidence  to  reject  the  null  hypothesis  that  the  error 
terms  are  uncorrelated. 

Exhibit  8.12  shows  three  of  our  diagnostic  tools  in  one  display — a  sequence  plot  of 
the  standardized  residuals,  the  sample  ACF  of  the  residuals,  and  p-values  for  the 
Ljung-Box  test  statistic  for  a  whole  range  of  values  of  K  from  5  to  15.  The  horizontal 
dashed  line  at  5%  helps  judge  the  size  of  the  p-values.  In  this  instance,  everything  looks 
very  good.  The  estimated  AR(1)  model  seems  to  be  capturing  the  dependence  structure 
of  the  color  property  time  series  quite  well. 


8.2  Overfitting  and  Parameter  Redundancy 
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>  win . graph (width=4 . 875 , height =4 . 5 ) 

>  tsdiag (ml . color , gof =15 , omit .initial=F) 

As  in  Chapter  3,  the  runs  test  may  also  be  used  to  assess  dependence  in  error  terms 
via  the  residuals.  Applying  the  test  to  the  residuals  from  the  AR(3)  model  for  the  Cana¬ 
dian  hare  abundance  series,  we  obtain  expected  runs  of  16.09677  versus  observed  runs 
of  18.  The  corresponding  /> value  is  0.602,  so  we  do  not  have  statistically  significant 
evidence  against  independence  of  the  error  terms  in  this  model. 

8.2  Overfitting  and  Parameter  Redundancy 


Our  second  basic  diagnostic  tool  is  that  of  overfitting.  After  specifying  and  fitting  what 
we  believe  to  be  an  adequate  model,  we  fit  a  slightly  more  general  model;  that  is,  a 
model  “close  by”  that  contains  the  original  model  as  a  special  case.  For  example,  if  an 
AR(2)  model  seems  appropriate,  we  might  overfit  with  an  AR(3)  model.  The  original 
AR(2)  model  would  be  confirmed  if: 

1.  the  estimate  of  the  additional  parameter,  ((>3,  is  not  significantly  different  from 
zero,  and 

2.  the  estimates  for  the  parameters  in  common,  (Jq  and  (jq,  do  not  change  signifi¬ 
cantly  from  their  original  estimates. 
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As  an  example,  we  have  specified,  fitted,  and  examined  the  residuals  of  an  AR(1) 
model  for  the  industrial  color  property  time  series.  Exhibit  8.13  displays  the  output  from 
the  R  software  from  fitting  the  AR(1)  model,  and  Exhibit  8.14  shows  the  results  from 
fitting  an  AR(2)  model  to  the  same  series.  First  note  that,  in  Exhibit  8.14,  the  estimate  of 
4>2  is  not  statistically  different  from  zero.  This  fact  supports  the  choice  of  the  AR(1) 
model.  Secondly,  we  note  that  the  two  estimates  of  4>  j  are  quite  close — especially  when 
we  take  into  account  the  magnitude  of  their  standard  errors.  Finally,  note  that  while  the 
AR(2)  model  has  a  slightly  larger  log-likelihood  value,  the  AR(1)  fit  has  a  smaller  AIC 
value.  The  penalty  for  fitting  the  more  complex  AR(2)  model  is  sufficient  to  choose  the 
simpler  AR(1)  model. 


Exhibit  8.13  AR(1)  Model  Results  for  the  Color  Property  Series 

Coefficients: f  arl  Intercept* 

0.5705  74.3293 

s.e.  0.1435  1.9151 

sigmaA2  estimated  as  24.83:  log-likelihood  =  -106.07,  AIC  =  216.15 
'  ml .  color  #  R  code  to  obtain  table 

+  Recall  that  the  intercept  here  is  the  estimate  of  the  process  mean  |i — not  90. 


Exhibit  8.14  AR(2)  Model  Results  for  the  Color  Property  Series 


Coefficients: 

arl 

ar2 

Intercept 

0.5173 

0.1005 

74.1551 

s.e. 

0.1717 

0.1815 

2.1463 

sigmaA2  estimated  as  24.6:  log-likelihood  =  -105.92,  AIC  =  217.84 
>  arima (color , order=c (2 , 0  ,  0 )  ) 


A  different  overfit  for  this  series  would  be  to  try  an  ARMA(1,1)  model.  Exhibit 
8.15  displays  the  results  of  this  fit.  Notice  that  the  standard  errors  of  the  estimated  coef¬ 
ficients  for  this  fit  are  rather  larger  than  what  we  see  in  Exhibits  8.13  and  8.14.  Regard¬ 
less,  the  estimate  of  (jq  from  this  fit  is  not  significantly  different  from  the  estimate  in 
Exhibit  8.13.  Furthermore,  as  before,  the  estimate  of  the  new  parameter,  9,  is  not  signif¬ 
icantly  different  from  zero.  This  adds  further  support  to  the  AR(1)  model. 


8.2  Overfitting  and  Parameter  Redundancy 
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Exhibit  8.15  Overfit  of  an  ARMA(1,1)  Model  for  the  Color  Series 


Coefficients: 

arl 

mal 

Intercept 

0.6721 

-0.1467 

74.1730 

s.e. 

0.2147 

0.2742 

2.1357 

sigmaA2  estimated  as  24.63:  log-likelihood  =  -105.94,  AIC  =  219.88 
>  arima (color, order=c (1 , 0 , 1)  ) 


As  we  have  noted,  any  ARMA (p,q)  model  can  be  considered  as  a  special  case  of  a 
more  general  ARMA  model  with  the  additional  parameters  equal  to  zero.  However, 
when  generalizing  ARMA  models,  we  must  be  aware  of  the  problem  of  parameter 
redundancy  or  lack  of  identifiability. 

To  make  these  points  clear,  consider  an  ARMA(1,2)  model: 

Yt  =  Wt_  l+et-Qlet_l-Q2et_2  (8.2.1) 

Now  replace  t  by  t  -  1  to  obtain 

Y,-l  =  <l>rt_2  +  et_1-eiet_2-e2et_3  (8.2.2) 

If  we  multiply  both  sides  of  Equation  (8.2.2)  by  any  constant  c  and  then  subtract  it  from 
Equation  (8.2.1),  we  obtain  (after  rearranging) 

Yf—  (4>  +  c)Y(_  j  +  §cY(_2  =  ef  —  (0j  +  c)et_  j  -  (02  -  d1c)et_2  +  cQ2et_3 

This  apparently  defines  an  ARMA(2,3)  process.  But  notice  that  we  have  the  factoriza¬ 
tions 

1  —  (4>  +  c).X  +  ())CX2  =  ( 1  -  4>x)(  1  -  cx) 

and 

1  -  (9j  +  c)x- (9t  -  cOj).*2  +  c9-,x3  =  ( 1  -  Bj.r  -  09x2)(  1  -  cx) 

Thus  the  AR  and  MA  characteristic  polynomials  in  the  ARMA(2,3)  process  have  a 
common  factor  of  (1  -  cx).  Even  though  Yt  does  satisfy  the  ARMA(2,3)  model,  clearly 
the  parameters  in  that  model  are  not  unique — the  constant  c  is  completely  arbitrary.  We 
say  that  we  have  parameter  redundancy  in  the  ARMA(2,3)  model. + 

The  implications  for  fitting  and  overfitting  models  are  as  follows: 

1.  Specify  the  original  model  carefully.  If  a  simple  model  seems  at  ah  promising, 
check  it  out  before  trying  a  more  complicated  model. 

2.  When  overfitting,  do  not  increase  the  orders  of  both  the  AR  and  MA  parts  of  the 
model  simultaneously. 


^  In  backshift  notation,  if  <\>(B)Yt  =  Q(B)et  is  a  correct  model,  then  so  is  (1  -  cB)§(B)Yt  = 
(1  -  cB)Q(B)e  for  any  constant  c.  To  have  unique  parameterization  in  an  ARMA  model, 
we  must  cancel  any  common  factors  in  the  AR  and  MA  characteristic  polynomials. 
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3.  Extend  the  model  in  directions  suggested  by  the  analysis  of  the  residuals.  For 
example,  if  after  fitting  an  MA(1)  model,  substantial  correlation  remains  at  lag  2 
in  the  residuals,  try  an  MA(2),  not  an  ARMA(1,1). 

As  an  example,  consider  the  color  property  series  once  more.  We  have  seen  that  an 
AR(1)  model  fits  quite  well.  Suppose  we  try  an  ARMA(2,1)  model.  The  results  of  this 
fit  are  shown  in  Exhibit  8.16.  Notice  that  even  though  the  estimate  of  <3}  and  the 
log-likelihood  and  AIC  values  are  not  too  far  from  their  best  values,  the  estimates  of  (jq, 
<t>2,  and  9  are  way  off,  and  none  would  be  considered  different  from  zero  statistically. 


Exhibit  8.16  Overfitted  ARMA(2,1)  Model  for  the  Color  Property  Series 


Coefficients: 

arl 

ar2 

mal 

Intercept 

0.2189 

0.2735 

0.3036 

74.1653 

s.e. 

2.0056 

1.1376 

2.0650 

2.1121 

sigmaA2  estimated  as  24.58:  log-likelihood  =  -105.91,  AIC  =  219.82 
>  arima (color, order=c (2 , 0 , 1) ) 


8.3  Summary 


The  ideas  of  residual  analysis  begun  in  Chapter  3  were  considerably  expanded  in  this 
chapter.  We  looked  at  various  plots  of  the  residuals,  checking  the  error  terms  for  con¬ 
stant  variance,  normality,  and  independence.  The  properties  of  the  sample  autocorrela¬ 
tion  of  the  residuals  play  a  significant  role  in  these  diagnostics.  The  Ljung-Box  statistic 
portmanteau  test  was  discussed  as  a  summary  of  the  autocorrelation  in  the  residuals. 
Lastly,  the  ideas  of  overfitting  and  parameter  redundancy  were  presented. 


Exercises 


8.1  For  an  AR(1)  model  with  4)  «  0.5  and  n  =  100,  the  lag  1  sample  autocorrelation  of 
the  residuals  is  0.5.  Should  we  consider  this  unusual?  Why  or  why  not? 

8.2  Repeat  Exercise  8.1  for  an  MA(1)  model  with  9  ~  0.5  and  n  =  100. 

8.3  Based  on  a  series  of  length  n  =  200,  we  fit  an  AR(2)  model  and  obtain  residual 
autocorrelations  of  r  j  =  0.13,  r  2  =  0.13,  and  =  0.12.  If  Ijq  =  1.1  and  =  -0.8, 
do  these  residual  autocorrelations  support  the  AR(2)  specification?  Individually? 
Jointly? 
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8.4  Simulate  an  AR(1)  model  with  n  =  30  and  (j>  =  0.5. 

(a)  Fit  the  correctly  specified  AR(1)  model  and  look  at  a  time  series  plot  of  the 
residuals.  Does  the  plot  support  the  AR(1)  specification? 

(b)  Display  a  normal  quantile-quantile  plot  of  the  standardized  residuals.  Does 
the  plot  support  the  AR(1)  specification? 

(c)  Display  the  sample  ACF  of  the  residuals.  Does  the  plot  support  the  AR(1) 
specification? 

(d)  Calculate  the  Ljung-Box  statistic  summing  to  K  =  8.  Does  this  statistic  sup¬ 
port  the  AR(1)  specification? 

8.5  Simulate  an  MA(1)  model  with  n  =  36  and  0  =  -0.5. 

(a)  Fit  the  correctly  specified  MA(1)  model  and  look  at  a  time  series  plot  of  the 
residuals.  Does  the  plot  support  the  MA(1)  specification? 

(b)  Display  a  normal  quantile-quantile  plot  of  the  standardized  residuals.  Does 
the  plot  support  the  MA(1)  specification? 

(c)  Display  the  sample  ACF  of  the  residuals.  Does  the  plot  support  the  MA(1) 
specification? 

(d)  Calculate  the  Ljung-Box  statistic  summing  to  K  =  6.  Does  this  statistic  sup¬ 
port  the  MA(1)  specification? 

8.6  Simulate  an  AR(2)  model  with  n  =  48,  4q  =  1.5,  and  ()> 2  =  -0.75. 

(a)  Fit  the  correctly  specified  AR(2)  model  and  look  at  a  time  series  plot  of  the 
residuals.  Does  the  plot  support  the  AR(2)  specification? 

(b)  Display  a  normal  quantile-quantile  plot  of  the  standardized  residuals.  Does 
the  plot  support  the  AR(2)  specification? 

(c)  Display  the  sample  ACF  of  the  residuals.  Does  the  plot  support  the  AR(2) 
specification? 

(d)  Calculate  the  Ljung-Box  statistic  summing  to  K  =  12.  Does  this  statistic  sup¬ 
port  the  AR(2)  specification? 

8.7  Fit  an  AR(3)  model  by  maximum  likelihood  to  the  square  root  of  the  hare  abun¬ 
dance  series  (filename  hare). 

(a)  Plot  the  sample  ACF  of  the  residuals.  Comment  on  the  size  of  the  correlations. 

(b)  Calculate  the  Ljung-Box  statistic  summing  to  K  =  9.  Does  this  statistic  sup¬ 
port  the  AR(3)  specification? 

(c)  Perform  a  runs  test  on  the  residuals  and  comment  on  the  results. 

(d)  Display  the  quantile-quantile  normal  plot  of  the  residuals.  Comment  on  the 
plot. 

(e)  Perform  the  Shapiro-Wilk  test  of  normality  on  the  residuals. 

8.8  Consider  the  oil  filter  sales  data  shown  in  Exhibit  1.8  on  page  7.  The  data  are  in 

the  file  named  oilfilters. 

(a)  Fit  an  AR(1)  model  to  this  series.  Is  the  estimate  of  the  4>  parameter  signifi¬ 
cantly  different  from  zero  statistically? 

(b)  Display  the  sample  ACF  of  the  residuals  from  the  AR(1)  fitted  model.  Com¬ 
ment  on  the  display. 
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8.9  The  data  file  named  robot  contains  a  time  series  obtained  from  an  industrial  robot. 
The  robot  was  put  through  a  sequence  of  maneuvers,  and  the  distance  from  a 
desired  ending  point  was  recorded  in  inches.  This  was  repeated  324  times  to  form 
the  time  series.  Compare  the  fits  of  an  AR(1)  model  and  an  IMA(1,1)  model  for 
these  data  in  terms  of  the  diagnostic  tests  discussed  in  this  chapter. 

8.10  The  data  file  named  deere3  contains  57  consecutive  values  from  a  complex 
machine  tool  at  Deere  &  Co.  The  values  given  are  deviations  from  a  target  value 
in  units  of  ten  millionths  of  an  inch.  The  process  employs  a  control  mechanism 
that  resets  some  of  the  parameters  of  the  machine  tool  depending  on  the  magni¬ 
tude  of  deviation  from  target  of  the  last  item  produced.  Diagnose  the  fit  of  an 
AR(1)  model  for  these  data  in  terms  of  the  tests  discussed  in  this  chapter. 

8.11  Exhibit  6.31  on  page  139,  suggested  specifying  either  an  AR(1)  or  possibly  an 
AR(4)  model  for  the  difference  of  the  logarithms  of  the  oil  price  series.  (The  file¬ 
name  is  oil. price). 

(a)  Estimate  both  of  these  models  using  maximum  likelihood  and  compare  the 
results  using  the  diagnostic  tests  considered  in  this  chapter. 

(b)  Exhibit  6.32  on  page  140,  suggested  specifying  an  MA(1)  model  for  the  dif¬ 
ference  of  the  logs.  Estimate  this  model  by  maximum  likelihood  and  perform 
the  diagnostic  tests  considered  in  this  chapter. 

(c)  Which  of  the  three  models  AR(1),  AR(4),  or  MA(1)  would  you  prefer  given 
the  results  of  parts  (a)  and  (b)? 


Chapter  9 


Forecasting 


One  of  the  primary  objectives  of  building  a  model  for  a  time  series  is  to  be  able  to  fore¬ 
cast  the  values  for  that  series  at  future  times.  Of  equal  importance  is  the  assessment  of 
the  precision  of  those  forecasts.  In  this  chapter,  we  shall  consider  the  calculation  of  fore¬ 
casts  and  their  properties  for  both  deterministic  trend  models  and  ARIMA  models.  Fore¬ 
casts  for  models  that  combine  deterministic  trends  with  ARIMA  stochastic  components 
are  considered  also. 

For  the  most  part,  we  shall  assume  that  the  model  is  known  exactly,  including  spe¬ 
cific  values  for  all  the  parameters.  Although  this  is  never  true  in  practice,  the  use  of  esti¬ 
mated  parameters  for  large  sample  sizes  does  not  seriously  affect  the  results. 

9.1  Minimum  Mean  Square  Error  Forecasting 


Based  on  the  available  history  of  the  series  up  to  time  t,  namely  Fj,  F2,...,  Yt_  j,  Yt,  we 
would  like  to  forecast  the  value  of  Yt  +  e  that  will  occur  l  time  units  into  the  future.  We 
call  time  t  the  forecast  origin  and  l  the  lead  time  for  the  forecast,  and  denote  the  fore¬ 
cast  itself  as  Yf(l) . 

As  shown  in  Appendix  F,  the  minimum  mean  square  error  forecast  is  given  by 

y,U)  =  E(Yt  +  (\Yl,Y2,...,Yt)  (9.1.1) 

(Appendices  E  and  F  on  page  218  review  the  properties  of  conditional  expectation  and 
minimum  mean  square  error  prediction.) 

The  computation  and  properties  of  this  conditional  expectation  as  related  to  fore¬ 
casting  will  be  our  concern  for  the  remainder  of  this  chapter. 

9.2  Deterministic  Trends 


Consider  once  more  the  deterministic  trend  model  of  Chapter  3, 

Y,  =  [i,  +  Xt  (9.2.1) 

where  the  stochastic  component,  Xt,  has  a  mean  of  zero.  For  this  section,  we  shall 
assume  that  {Xt}  is  in  fact  white  noise  with  variance  y0.  For  the  model  in  Equation 
(9.2.1),  we  have 


191 


192 


Forecasting 


rf(0  =  E(nt+f+xtjY1,Y2,...,rt) 

=  E{vf  +  e\Yv  Y2,  ...,  Yt)  +  E(XtU\Yi,  Y2,  Yt) 

^ t  +  C  v  t  +  r 
or 

Y,U)  =  »tU  (9.2.2) 

since  for  l  >  1,  Xr  +  t  is  independent  of  Y\,  Y2,. . Yt_  j,  Y,  and  has  expected  value  zero. 
Thus,  in  this  simple  case,  forecasting  amounts  to  extrapolating  the  deterministic  time 
trend  into  the  future. 

For  the  linear  trend  case,  =  P0  +  Pj/,  the  forecast  is 

Y,U)  =  Po  +  P,(/  +  0  (9.2.3) 

As  we  emphasized  in  Chapter  3,  this  model  assumes  that  the  same  linear  time  trend  per¬ 
sists  into  the  future,  and  the  forecast  reflects  that  assumption.  Note  that  it  is  the  lack  of 
statistical  dependence  between  Yt  +  e  and  Y j,  Y2,...,  Yt _  j,  Yt  that  prevents  us  from 
improving  on  +  ( as  a  forecast. 

For  seasonal  models  where,  say,  =  p.+  12  ,  our  forecast  is 
Y t(i  +  12).  Thus  the  forecast  will  also  be  periodic,  as  desired. 

The  forecast  error,  et((),  is  given  by 

e,(()  = 


so  that 

E(et(())  =  E(Xt  +  ()  =  0 

That  is,  the  forecasts  are  unbiased.  Also 

Var(et(( ))  =  Var(X(  +  ()  =  yQ  (9.2.4) 

is  the  forecast  error  variance  for  all  lead  times  t. 

The  cosine  trend  model  for  the  average  monthly  temperature  series  was  estimated 
in  Chapter  3  on  page  35  as 

A,  =  46.2660  +  (-26.7079)  cos  (2nr)  +  (-2.1697)sin(2jif) 

Flere  time  is  measured  in  years  with  a  starting  value  of  January  1964,  frequency/=  1  per 
year,  and  the  final  observed  value  is  for  December  1975.  To  forecast  the  June  1976  tem¬ 
perature  value,  we  use  t  =  1976.41667  as  the  time  value*  and  obtain 


Yt+rYW 

^t+e  +  Xt  +  e~^t+e 


x 


t  +  e 


Yt(e)  =  p 


t+  12  +  f 


1  June  is  the  fifth  month  of  the  year,  and  5/12  ~  0.416666666. . .  . 
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pf  =  46.2660  +  (-26.7079)cos(2?i(  1976.41667))  +  (-2.1697)sin(2ii(  1976.41667)) 
=  68.3  °F 

Forecasts  for  other  months  are  obtained  similarly. 

9.3  ARIMA  Forecasting 


For  ARIMA  models,  the  forecasts  can  be  expressed  in  several  different  ways.  Each 
expression  contributes  to  our  understanding  of  the  overall  forecasting  procedure  with 
respect  to  computing,  updating,  assessing  precision,  or  long-term  forecasting  behavior. 

AR(1) 

We  shall  first  illustrate  many  of  the  ideas  with  the  simple  AR(  1)  process  with  a  nonzero 
mean  that  satisfies 

Yt- p  =  if{Yt_1-\Y)  +  et  (9.3.1) 

Consider  the  problem  of  forecasting  one  time  unit  into  the  future.  Replacing  t  by  t  +  1  in 
Equation  (9.3.1),  we  have 

yf  +  1-F  =  <t>(r,-F)  +  e,+  1  (9-3-2) 

Given  Fj,  F2,...,  Yt_  j,  Yt,  we  take  the  conditional  expectations  of  both  sides  of  Equation 
(9.3.2)  and  obtain 

F,(l)-p  =  4>[£(Ff|Fj,F2,  -p]+£(ef  +1|Fj,F2,  ...,Yt)  (9.3.3) 

Now,  from  the  properties  of  conditional  expectation,  we  have 

E(Yt\Yv  Y2,  ...,Yt)  =  Yt  (9.3.4) 

Also,  since  et+  t  is  independent  of  Fj,  F2,  ...,  Yt_  j,  Ff,  we  obtain 

E(et+  j | Fj,  F2,  ...,  Yt)  =  E(et+l)  =  0  (9.3.5) 

Thus,  Equation  (9.3.3)  can  be  written  as 

yt(l)  =  H  +  (|>(F,-H)  (9.3.6) 

In  words,  a  proportion  4>  of  the  current  deviation  from  the  process  mean  is  added  to  the 
process  mean  to  forecast  the  next  process  value. 

Now  consider  a  general  lead  time  l.  Replacing  t  by  1  +  Cm  Equation  (9.3.1)  and  tak¬ 
ing  the  conditional  expectations  of  both  sides  produces 

Y{(()  =  p  +  §[Y({C-  1)  -  p]  for^>l  (9.3.7) 

since  E(Yf  +  ( _1|F1,  F2,  ...,  Yt)  =  Yf(C-  1)  and,  for  l  >  1,  et  +  e  is  independent  of  Fj, 
F2,  ...,  Y,_  1,  Yt. 
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Equation  (9.3.7),  which  is  recursive  in  the  lead  time  l,  shows  how  the  forecast  for 
any  lead  time  t  can  be  built  up  from  the  forecasts  for  shorter  lead  times  by  starting  with 
the  initial  forecast  Yt(  1 )  computed  using  Equation  (9.3.6).  The  forecast  Yf( 2)  is  then 
obtained^from  Y(( 2)  =  p  +  (j)[Tf(  1)  -  p]  ,  then  Tf(3)  from  T,(2),  and  so  on  until  the 
desired  Y At)  is  found.  Equation  (9.3.7)  and  its  generalizations  for  other  ARIMA  models 
are  most  convenient  for  actually  computing  the  forecasts.  Equation  (9.3.7)  is  sometimes 
called  the  difference  equation  form  of  the  forecasts. 

However,  Equation  (9.3.7)  can  also  be  solved  to  yield  an  explicit  expression  for  the 
forecasts  in  terms  of  the  observed  history  of  the  series.  Iterating  backward  on  fin  Equa¬ 
tion  (9.3.7),  we  have 

Yt(t 0  =  <| )[T,(f- l)-p]  +  p 

=  +  p 

=  ^->[F,(l)-p]  +  p 
or 

Yt(i)  =  p  +  ^(K;-p)  (9.3.8) 

The  current  deviation  from  the  mean  is  discounted  by  a  factor  <|> ,  whose  magnitude 
decreases  with  increasing  lead  time.  The  discounted  deviation  is  then  added  to  the  pro¬ 
cess  mean  to  produce  the  lead  l  forecast. 

As  a  numerical  example,  consider  the  AR(1)  model  that  we  have  fitted  to  the  indus¬ 
trial  color  property  time  series.  The  maximum  likelihood  estimation  results  were  par¬ 
tially  shown  in  Exhibit  7.7  on  page  165,  but  more  complete  results  are  shown  in  Exhibit 
9.1. 


Exhibit  9.1  Maximum  Likelihood  Estimation  of  an  AR(1)  Model  for  Color 

Coefficients:  arl  intercept1' 

0.5705  74.3293 

s.e.  0.1435  1.9151 

sigmaA2  estimated  as  24.8:  log-likelihood  =  -106.07,  AIC  =  216.15 
'  Remember  that  the  intercept  here  is  the  estimate  of  the  process  mean  p — not  0O. 

>  data (color) 

>  ml . color =arima (color , order =c (1,0,0)  ) 

>  ml . color 


For  illustration  purposes,  we  assume  that  the  estimates  4>  =  0.5705  and  p  =  74.3293  are 
true  values.  The  final  forecasts  may  then  be  rounded. 
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The  last  observed  value  of  the  color  property  is  67,  so  we  would  forecast  one  time 
period  ahead  as  ' 

7,(  1)  =  74.3293  +  (0.5705)(67  -  74.3293) 

=  74.3293-4.181366 
=  70.14793 

For  lead  time  2,  we  have  from  Equation  (9.3.7) 

7,(  2)  =  74.3293  +  0.5705(70.14793-74.3293) 

=  74.3293  -2.385472 
=  71.94383 

Alternatively,  we  can  use  Equation  (9.3.8): 

7,(2)  =  74.3293  +  (0.5705)2(67 -74.3293) 

=  71.92823 

At  lead  5,  we  have 

7,(5)  =  74.3293  +  (0.5705)5(67- 74.3293) 

=  73.88636 

and  by  lead  10  the  forecast  is 

7,(10)  =  74.30253 

which  is  very  nearly  p  (=  74.3293).  In  reporting  these  forecasts  we  would  probably 
round  to  the  nearest  tenth. 

In  general,  since  |(}>|  <  1,  we  have  simply 

Yt(C)  ~  p  for  large  t  (9.3.9) 

Later  we  shall  see  that  Equation  (9.3.9)  holds  for  all  stationary  ARMA  models. 

Consider  now  the  one-step-ahead  forecast  error,  e,(l) .  From  Equations  (9.3.2) 
and  (9.3.6),  we  have 

*,(i)  =  r,+  i -r,(D 

=  [<|)(^-p)  +  P  +  e,+  1]-[(|)(7,-p)  +  p] 


or 


e,(\) 


-/+ 1 


(9.3.10) 


:  As  round  off  error  will  accumulate,  you  should  use  many  decimal  places  when  performing 
recursive  calculations. 
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The  white  noise  process  {et}  can  now  be  reinterpreted  as  a  sequence  of  one-step-ahead 
forecast  errors.  We  shall  see  that  Equation  (9.3.10)  persists  for  completely  general 
ARIMA  models.  Note  also  that  Equation  (9.3.10)  implies  that  the  forecast  error  ef(  1)  is 
independent  of  the  history  of  the  process  K|,  F2,  ...,  Y,  _  1(  Yt  up  to  time  t.  If  this  were 
not  so,  the  dependence  could  be  exploited  to  improve  our  forecast. 

Equation  (9.3.10)  also  implies  that  our  one-step-ahead  forecast  error  variance  is 
given  by 

Var{et(  1))  =  g2  (9.3.11) 

To  investigate  the  properties  of  the  forecast  errors  for  longer  leads,  it  is  convenient  to 
express  the  AR(1)  model  in  general  linear  process,  or  MA(oo),  form.  From  Equation 
(4.3.8)  on  page  70,  we  recall  that 

Yt  =  et  +  j  +  4>2£'f  _  +  <j)3ef  3  +  •  ••  (9.3.12) 

Then  Equations  (9.3.8)  and  (9.3.12)  together  yield 
eM)  = 

so  that 

*f(0  = 

which  can  also  be  written  as 


Yt  +  e-V-V(Yt-v) 


+ 

1 
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(9.3.13) 

etU  +  V  l‘ 


t+e 


i  +  ^tU- 


+  ¥, 


lKt+  1 


(9.3.14) 


Equation  (9.3.14)  will  be  shown  to  hold  for  all  ARIMA  models  (see  Equation  (9.3.43) 
on  page  202). 

Note  that  E(et(i ))  =  0  ;  thus  the  forecasts  are  unbiased.  Furthermore,  from  Equa¬ 
tion  (9.3.14),  we  have 

Var(et(t !))  =  g2(1  +  v|/2  +  \|/f  +  •••  +  \|/2^ (9.3.15) 


We  see  that  the  forecast  error  variance  increases  as  the  lead  l  increases.  Contrast  this 
with  the  result  given  in  Equation  (9.2.4)  on  page  192,  for  deterministic  trend  models. 

In  particular,  for  the  AR(1)  case. 


Var(et(l)) 


ri-d,2*i 

_  l  -  (1>2_ 


(9.3.16) 


which  we  obtain  by  summing  a  finite  geometric  series. 
For  long  lead  times,  we  have 
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°e 

Var(et(C))& - -  for  large  t  (9.3.17) 

1  —  <)>— 

or,  by  Equation  (4.3.3),  page  66, 

Var(et(C ))  ~  Var(Yf)  =  yQ  for  large  l  (9.3.18) 

Equation  (9.3.18)  will  be  shown  to  be  valid  for  all  stationary  ARMA  processes  (see 
Equation  (9.3.39)  on  page  201). 


MA(1) 

To  illustrate  how  to  solve  the  problems  that  arise  in  forecasting  moving  average  or 
mixed  models,  consider  the  MA(1)  case  with  nonzero  mean: 

Yf  =  p  +  ef-  9er  j 

Again  replacing  t  by  t  +  1  and  taking  conditional  expectations  of  both  sides,  we  have 

Tf(i)  =  n-e£(e,|r1,y2,...,yf)  (9.3.19) 

However,  for  an  invertible  model.  Equation  (4.5.2)  on  page  80  shows  that  et  is  a  function 
of  K|,  Y2 ,  ...,  Yt  and  so 

E(et\YvY2,...,Yt)  =  et  (9.3.20) 

In  fact,  an  approximation  is  involved  in  this  equation  since  we  are  conditioning  only  on 
Y j,  Y2,  ...,  Yt  and  not  on  the  infinite  history  of  the  process.  However,  if,  as  in  practice,  t 
is  large  and  the  model  is  invertible,  the  error  in  the  approximation  will  be  very  small.  If 
the  model  is  not  invertible — for  example,  if  we  have  overdifferenced  the  data — then 
Equation  (9.3.20)  is  not  even  approximately  valid;  see  Harvey  (1981c,  p.161). 

Using  Equations  (9.3.19)  and  (9.3.20),  we  have  the  one-step-ahead  forecast  for  an 
invertible  MA(1)  expressed  as 

yf(l)  =  |i  -  0ef  (9.3.21) 

The  computation  of  et  will  be  a  by-product  of  estimating  the  parameters  in  the  model. 
Notice  once  more  that  the  one-step-ahead  forecast  error  is 

‘,(1)  =  Y,+  l~YtW 

=  (v  +  et+l-Qet)-(v—Qet) 


as  in  Equation  (9.3.10),  and  thus  Equation  (9.3.1 1)  also  obtains. 

For  longer  lead  times,  we  have 

YM)  =  ^+E(etU\Y1,Y2,...,Yt)-eE(et  +  (jYl,Y2,...,Yt) 
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But,  for  l  >  1,  both  et  +  e  and  et  +  j  are  independent  of  Tj,  Yr  Consequently, 

these  conditional  expected  values  are  the  unconditional  expected  values,  namely  zero, 
and  we  have 

Yt(()  =  p  for  l >  1  (9.3.22) 

Notice  here  that  Equation  (9.3.9)  on  page  195  holds  exactly  for  the  MA(1 )  case  when  t  > 
1.  Since  for  this  model  we  trivially  have  i|/|  =  -0  and  \\ij  =  0  for  j  >  1,  Equations  (9.3.14) 
and  (9.3.15)  also  hold. 

The  Random  Walk  with  Drift 

To  illustrate  forecasting  with  nonstationary  ARIMA  series,  consider  the  random  walk 
with  drift  defined  by 

Y,  =  Yt-l  +0o  +  ef  (9.3.23) 

Here 

Ff(l)  =  E(Yt\Y1,Y2,...,Yt)  +  80  +  E(et+l\Yl,Y2,...,Yt) 


so  that 

Yt(  1 )  =  K,  +  0()  (9.3.24) 

Similarly,  the  difference  equation  form  for  the  lead  i  forecast  is 

Yf(()  =  Yf(e-  1)  +  0Q  for  t>\  (9.3.25) 

and  iterating  backward  on  t  yields  the  explicit  expression 

Yt(e)  =  Yt  +  Q0C  for  C>  1  (9.3.26) 

In  contrast  to  Equation  (9.3.9)  on  page  195,  if  0q  ^  0,  the  forecast  does  not  converge  for 
long  leads  but  rather  follows  a  straight  line  with  slope  0O  for  all  l. 

Note  that  the  presence  or  absence  of  the  constant  term  0()  significantly  alters  the 
nature  of  the  forecast.  For  this  reason,  constant  terms  should  not  be  included  in  nonsta¬ 
tionary  ARIMA  models  unless  the  evidence  is  clear  that  the  mean  of  the  differenced 
series  is  significantly  different  from  zero.  Equation  (3.2.3)  on  page  28  for  the  variance 
of  the  sample  mean  will  help  assess  this  significance. 

However,  as  we  have  seen  in  the  AR(1)  and  MA(1)  cases,  the  one-step-ahead  fore¬ 
cast  error  is 

=  Y,+  l-YM)  =  et+l 


Also 
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«,(0  =  y,  +  rfyO 

=  (  yt  +  C0()  +  er+  |  +  •••  +ef  +  p-(Fr  +  ^0o) 

=  er+l  +  e/  +  2  +  •••  +ef  +  <? 

which  agrees  with  Equation  (9.3.14)  on  page  196  since  in  this  model  \\ij  =  1  for  all  j. 
(See  Equation  (5.2.6)  on  page  93  with  0  =  0.) 

So,  as  in  Equation  (9.3.15),  we  have 

l-  1 

Var(et(C))  =  a2  ^  v| >j  =  l<32e  (9.3.27) 

7  =  0 

In  contrast  to  the  stationary  case,  here  Var(et(C))  grows  without  limit  as  the  forecast 
lead  time  l  increases.  We  shall  see  that  this  property  is  characteristic  of  the  forecast  error 
variance  for  all  nonstationary  ARIMA  processes. 


ARMA(p,qr) 


For  the  general  stationary  ARMA(/?,g)  model,  the  difference  equation  form  for  comput¬ 
ing  forecasts  is  given  by 


Yt{c)  =  b1Yt(e-i)  +  <b2Yt(e-2)  +  -+<bpYt(e-p)  +  Q0 

-01  E(etU  l\Yv  Y2,  ...,  Yt)-Q2E(et  +  (_2\Ylt  Y2,  ...,  Yf) 

-■■■-%E{etU_q\Yx,Y2,...,Yt) 


(9.3.28) 


where 


E(et  +  j\YvY2,...,Yt) 


[  0  for  /'  >  0 
1  et+j  for  j  <  0 


(9.3.29) 


A  A 

We  note  that  Yf(j)  is  a  true  forecast  for  j  >  0,  but  for  j  <  0,  Yf(j)  =  Yf  +  ..  As  in  Equa¬ 
tion  (9.3.20)  on  page  197,  Equation  (9.3.29)  involves  some  minor  approximation.  For  an 
invertible  model,  Equation  (4.5.5)  on  page  80  shows  that,  using  the  n-weights,  e,  can  be 
expressed  as  a  linear  combination  of  the  infinite  sequence  Yt,  Yt_\,  Yt_  2,....  However, 
the  n-weights  die  out  exponentially  fast,  and  the  approximation  assumes  that  n j  is  negli¬ 
gible  for  j>t  —  q. 

As  an  example,  consider  an  ARM A(  1,1)  model.  We  have 


yr(D  =  4)Ff  +  0o-0ef 


(9.3.30) 


with 

Yt(  2)  =  ^E,(1)  +  0O 


and,  more  generally, 
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YtU)  =  (|)y^-l)  +  e0  for C>2 


(9.3.31) 


using  Equation  (9.3.30)  to  get  the  recursion  started. 

Equations  (9.3.30)  and  (9.3.31)  can  be  rewritten  in  terms  of  the  process  mean  and 
then  solved  by  iteration  to  get  the  alternative  explicit  expression 

Yt(t 0  =  +  for  f  >  1  (9.3.32) 


As  Equations  (9.3.28)  and  (9.3.29)  indicate,  the  noise  terms  et_(q_  i),...,  et_  j,  et 
appear  directly  in  the  computation  of  the  forecasts  for  leads  t  =  1,  2,...,  q.  However,  for 
t  >  q,  the  autoregressive  portion  of  the  difference  equation  takes  over,  and  we  have 

Yt{c)  =  1) +  <b2Yt(e-2) +  ■■■ +<bpYt(e-p),+  Q0  for  t>  q  (9.3.33) 

Thus  the  general  nature  of  the  forecast  for  long  lead  times  will  be  determined  by  the 
autoregressive  parameters  (|)|,  4>2, . . . ,  4>p  (and  the  constant  term,  90,  which  is  related  to 
the  mean  of  the  process). 

Recalling  from  Equation  (5.3.17)  on  page  97  that  0Q  =  p(  1  -  4>j  -  (j>2  -  •••  -  §p) , 
we  can  rewrite  Equation  (9.3.33)  in  terms  of  deviations  from  p  as 


Yt(t 0-p  =  4* j —  1 )  —  p]  +■ (t>2[ —  2)  —  p]  +  *■' 
+  4)/,[Tf(f-/7)-p]  for  l>  q 


(9.3.34) 


A 

As  a  function  of  lead  time  C,  Y -  p  follows  the  same  Yule-Walker  recursion  as  the 
autocorrelation  function  pk  of  the  process  (see  Equation  (4.4.8),  page  79).  Thus,  as  in 
Section  4.3  on  page  66  and  Section  4.4  on  page  77,  the  roots  of  the  characteristic  equa- 

A 

tion  will  determine  the  general  behavior  of  Y it)  -  p  for  large  lead  times.  In  particular, 
Y.d)  -  p  can  be  expressed  as  a  linear  combination  of  exponentially  decaying  terms  in  t 
(corresponding  to  the  real  roots)  and  damped  sine  wave  terms  (corresponding  to  the 
pairs  of  complex  roots). 

A 

Thus,  for  any  stationary  ARMA  model,  Yt(C)  -  p  decays  to  zero  as  l  increases,  and 
the  long-term  forecast  is  simply  the  process  mean  p  as  given  in  Equation  (9.3.9)  on 
page  195.  This  agrees  with  common  sense  since  for  stationary  ARMA  models  the 
dependence  dies  out  as  the  time  span  between  observations  increases,  and  this  depen¬ 
dence  is  the  only  reason  we  can  improve  on  the  "naive”  forecast  of  using  p  alone. 

To  argue  the  validity  of  Equation  (9.3.15)  for  etd)in  the  present  generality,  we 
need  to  consider  a  new  representation  for  ARIMA  processes.  Appendix  G  shows  that 
any  ARIMA  model  can  be  written  in  truncated  linear  process  form  as 


Yt+c=  CM)  +  IM)  for  1  >  1 


(9.3.35) 


where,  for  our  present  purposes,  we  need  only  know  that  Cfl)  is  a  certain  function  of  Yt, 
and 


t  +  e-2 


+  v. 


.  iet  + 1 


for  t  >  1 


(9.3.36) 
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Furthermore,  for  invertible  models  with  t  reasonably  large,  Ct(C)  is  a  certain  function  of 
the  finite  history  Yt,  Yt_  j,...,  Fj.  Thus  we  have 

>70  =  77(017,  r2, ...,  Yt)  +  E(i,(Q  17,  r2, ...,  Yt ) 

=  7(0 


Finally, 


7(0  =  F +,-rf(o 


=  [7(0  +  701-7(0 
=  0(0 

=  et+e+vietu-i  +  V2etu-2  +  -+Vc-iet+x 

Thus,  for  a  general  invertible  ARIMA  process, 

E[et(e)]  =  0  for  l  >  1  (9.3.37) 


and 


1 

Vcir(et{ £))  =  aj,  ^  \|/?  for^>  1 

7  =  0 


(9.3.38) 


From  Equations  (4.1.4)  and  (9.3.38),  we  see  that  for  long  lead  times  in  stationary 
ARMA  models,  we  have 

CO 

Var(et( 0)  ~  <7  £  \|/? 

7  =  0 


or 

Fflr(cf(0)  «  y0  for  larged 


(9.3.39) 


Nonstationary  Models 

As  the  random  walk  shows,  forecasting  for  nonstationary  ARIMA  models  is  quite  simi¬ 
lar  to  forecasting  for  stationary  ARMA  models,  but  there  are  some  striking  differences. 
Recall  from  Equation  (5.2.2)  on  page  92  that  an  ARIMAl/;,  I  ,q)  model  can  be  written  as 
a  nonstationary  ARMA(p+  l,q)  model.  We  shall  write  this  as 

Y,  =  (pi7-l+(P27-2  +  (P37-3+"-+(PpFf-p  +  (Pp+l7-P-l 

(9.3.40) 

+  7-0i7-i-027-2 - Vf-9 

where  the  script  coefficients  <p  are  directly  related  to  the  block  c|)  coefficients.  In  particu¬ 
lar, 
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<Pl  =  •  +<!>p<P/  =  <!>/ -  <!>/ _  i  for;  =  1,2,  ...,p 

and  ■  (9.3.41) 

‘Pp+l  =  ~*p 

For  a  general  order  of  differencing  d,  we  would  have  p  +  d  of  the  cp  coefficients. 

From  this  representation,  we  can  immediately  extend  Equations  (9.3.28),  (9.3.29), 
and  (9.3.30)  on  page  199  to  cover  the  nonstationary  cases  by  replacing  p  by  p  +  d  and  (f)y 
by  <p y. 

As  an  example  of  the  necessary  calculations,  consider  the  AR1MA(  1,1,1)  case. 
Flere 

Y,~Y,  t  =  HYt_l-Yt_2)  +  Q0  +  et-Qet_l 

so  that 

Y,  =  (1  +  §Y(_2  +  Qq  +  e(  -  Qe(_  ^ 

Thus 

Yf(  1 )  =  (l+<|>)Ff-())yf_1  +  0o-0er 

rt(  2)  =  (i  +  ^)yf(i)-^yf+0o 

and 

Yt(0  =  (1  +  §)Y((C-  1)  -§Yt(C-2)  +  0Q 

For  the  general  invertible  ARIMA  model,  the  truncated  linear  process  representation 
given  in  Equations  (9.3.35)  and  (9.3.36)  and  the  calculations  following  these  equations 
show  that  we  can  write 

eM)  =  et  +  (+'Viet  +  e^l  +  V2et  +  (-2+  +xve-iet+l  (9.3.43) 

and  so 

E(et(())  =  0  for<?>  1  (9.3.44) 

and 

(- 1 

Var(et(C))  =  ^  \\ij  for^>  1  (9.3.45) 

7  =  0 

Flowever,  for  nonstationary  series,  the  yy-weights  do  not  decay  to  zero  as  j  increases. 
For  example,  for  the  random  walk  model,  vy  =  1  for  all  /:  for  the  1MA(1,1)  model,  \\ij  = 
1-0  for  /  >  1;  for  the  IMA(2,2)  case,  \\ij  =  1  +  02  +  (1  -  0j  -  02);  for;  >  1;  and  for  the 
ARI(1,1)  model,  \|/y  =  (1  -  (j>-f+1)/(l  -  4>)  for ;  >  1  (see  Chapter  5). 

Thus,  for  any  nonstationary  model.  Equation  (9.3.45)  shows  that  the  forecast  error 
variance  will  grow  without  bound  as  the  lead  time  t  increases.  This  fact  should  not  be 
too  surprising  since  with  nonstationary  series  the  distant  future  is  quite  uncertain. 


(9.3.42) 
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As  in  all  statistical  endeavors,  in  addition  to  forecasting  or  predicting  the  unknown  Yl  +  (, 
we  would  like  to  assess  the  precision  of  our  predictions. 


Deterministic  Trends 


For  the  deterministic  trend  model  with  a  white  noise  stochastic  component  {XJ,  we 
recall  that 

YM)  =  VtU 

and 

Var(et(())  =  Var(X[  +  ()  =  yQ 

If  the  stochastic  component  is  normally  distributed,  then  the  forecast  error 

eM)  =  Yt  +  rYt(C)=XtU  (9.4.1) 


is  also  normally  distributed.  Thus,  for  a  given  confidence  level  1  -  a,  we  could  use  a 
standard  normal  percentile,  z.\  _  a/2,  to  claim  that 


-z 


1  -  a/2 


J_1±r^_<z 

JVar(et(e )) 


1  -a/2 


1  -  a 


or,  equivalently, 

p\Yt{i)-z1_a/2Jvar{et{e))  <  Yt  +  e <  Yt(e)  +  Zl_a/2Jvar(et(e))]  =  1-a 

Thus  we  may  be  (1  -  a)100%  confident  that  the  future  observation  Yt  +  e  will  be 
contained  within  the  prediction  limits 

(9A.2) 

As  a  numerical  example,  consider  the  monthly  average  temperature  series  once 
more.  On  page  192,  we  used  the  cosine  model  to  predict  the  June  1976  average  temper¬ 
ature  as  68.3 °F.  The  estimate  of  JVar(et(C ))  =  Jy~0  for  this  model  is  3.7°F.  Thus  95% 
prediction  limits  for  the  average  June  1976  temperature  are 


68.3  ±  1.96(3.7)  =  68.3  ±  7.252  or  61.05°F  to  75.55°F 

Readers  who  are  familiar  with  standard  regression  analysis  will  recall  that  since  the 
forecast  involves  estimated  regression  parameters,  the  correct  forecast  error  variance  is 
given  by  y0[  1  +  (1  In)  +cn  e],  where  cn  e  is  a  certain  function  of  the  sample  size  n  and  the 
lead  time  l.  However,  it  may  be  shown  that  for  the  types  of  trends  that  we  are  consider¬ 
ing  (namely,  cosines  and  polynomials  in  time)  and  for  large  sample  sizes  n,  the  1  In  and 
cn  t  are  both  negligible  relative  to  1.  For  example,  with  a  cosine  trend  of  period  12  over 
N  =  n/12  years,  we  have  that  cn  e  =  2 ln\  thus  the  correct  forecast  error  variance  is 
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Yq[1  +  (3 In)]  rather  than  our  approximate  yg.  For  the  linear  time  trend  model,  it  can  be 
shown  that  cn  t  =  3 (n  +  2 i-  1  )2/[n(n2  -  1)]  «  3 In  for  moderate  lead  l  and  large  n.  Thus, 
again  our  approximation  seems  justified. 

ARIMA  Models 

If  the  white  noise  terms  {et}  in  a  general  ARIMA  series  each  arise  independently  from  a 
normal  distribution,  then  from  Equation  (9.3.43)  on  page  202,  the  forecast  error 
eXi)  will  also  have  a  normal  distribution,  and  the  steps  leading  to  Equation  (9.4.2) 
remain  valid.  However,  in  contrast  to  the  deterministic  trend  model,  recall  that  in  the 
present  case 

Var(et(l))  =  £  \\ij 

j  =  o 

In  practice,  op  will  be  unknown  and  must  be  estimated  from  the  observed  time  series. 
The  necessary  \|/-weights  are,  of  course,  also  unknown  since  they  are  certain  functions 
of  the  unknown  (j)  ’s  and  0’s.  For  large  sample  sizes,  these  estimations  will  have  little 
effect  on  the  actual  prediction  limits  given  above. 

As  a  numerical  example,  consider  the  AR(1)  model  that  we  estimated  for  the  indus¬ 
trial  color  property  series.  From  Exhibit  9.1  on  page  194,  we  use  4)  =  0.5705,  |i  = 
74.3293,  and  =  24.8.  For  an  AR(1)  model,  we  recall  Equation  (9.3.16)  on  page  196 

r  i  — 

Var(et(())  = 

For  a  one-step-ahead  prediction,  we  have 

70.14793  ±  1.96  7248  =  70.14793  ±  9.760721  or  60.39  to  79.91 
Two  steps  ahead,  we  obtain 

71.86072  ±  11.88343  or  60.71  to  83.18 

Notice  that  this  prediction  interval  is  wider  than  the  previous  interval.  Forecasting  ten 
steps  ahead  leads  to 

74.173934  ±  11.88451  or  62.42  to  86.19 

By  lead  10,  both  the  forecast  and  the  forecast  limits  have  settled  down  to  their  long-lead 
values. 

9.5  Forecasting  Illustrations 


Rather  than  showing  forecast  and  forecast  limit  calculations,  it  is  often  more  instructive 
to  display  appropriate  plots  of  the  forecasts  and  their  limits. 


9.5  Forecasting  Illustrations 
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Deterministic  Trends 

Exhibit  9.2  displays  the  last  four  years  of  the  average  monthly  temperature  time  series 
together  with  forecasts  and  95%  forecast  limits  for  two  additional  years.  Since  the 
model  fits  quite  well  with  a  relatively  small  error  variance,  the  forecast  limits  are  quite 
close  to  the  fitted  trend  forecast. 


Exhibit  9.2  Forecasts  and  Limits  for  the  Temperature  Cosine  Trend 


>  data (tempdub) 

>  tempdubl=ts (c (tempdub , rep (NA, 24 ) ) , s tart =s tart (tempdub) , 

f req=f requency (tempdub) ) 

>  har . =harmonic (tempdub , 1 ) 

>  m5 . tempdub=arima (tempdub, order =c (0,0,0) , xreg=har . ) 

>  newhar . =harmonic (ts (rep ( 1 , 24 ) ,  start=c ( 1976 , 1 ) , f req=12 ) , 1 ) 

>  win . graph (width=4 . 875 ,  height=2 . 5 , pointsize=8 ) 

>  plot (m5 . tempdub , n . ahead=24 ,nl=c(1972,l) , newxreg=newhar . , 

type= ' b ' , ylab= ' Temperature ' , xlab= ' Year ' ) 


ARIMA  Models 

We  use  the  industrial  color  property  series  as  our  first  illustration  of  ARIMA  forecast¬ 
ing.  Exhibit  9.3  displays  this  series  together  with  forecasts  out  to  lead  time  12  with  the 
upper  and  lower  95%  prediction  limits  for  those  forecasts.  In  addition,  a  horizontal  line 
at  the  estimate  for  the  process  mean  is  shown.  Notice  how  the  forecasts  approach  the 
mean  exponentially  as  the  lead  time  increases.  Also  note  how  the  prediction  limits 
increase  in  width. 
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Exhibit  9.3  Forecasts  and  Forecast  Limits  for  the  AR(1)  Model  for  Color 


>  data (color) 

>  ml . color =arima (color , order =c (1,0,0)  ) 

>  plot (ml . color , n . ahead=12 , type= ' b 1 , xlab= ' Time ' , 

ylab= 1  Color  Property1) 

>  abline (h=coef (ml . color)  [names (coef (ml . color) ) == 1  intercept ' ] ) 


The  Canadian  hare  abundance  series  was  fitted  by  working  with  the  square  root  of 
the  abundance  numbers  and  then  fitting  an  AR(3)  model.  Notice  how  the  forecasts 
mimic  the  approximate  cycle  in  the  actual  series  even  when  we  forecast  with  a  lead  time 
out  to  25  years  in  Exhibit  9.4. 


Exhibit  9.4  Forecasts  from  an  AR(3)  Model  for  Sqrt(Hare) 


,  A  A 

A  A 

>  \  6  .  .  to  e  ®oOP 

.  .  .  be 

V. 

i - 1 - 1 - 1 - 1 - 1 — 

1910  1920  1930  1940  1950  1960 

Year 


>  data (hare) 

>  ml . hare=arima ( sqrt (hare) , order=c (3,0,0) ) 

>  plot (ml . hare ,  n . ahead=25 , type= ' b 1 , 

xlab= ' Year ' , ylab= ' Sqrt (hare) ' ) 

>  abline (h=coef (ml . hare)  [names ( coef (ml . hare) )==' intercept ']  ) 


9.6  Updating  ARIMA  Forecasts 
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9.6  Updating  ARIMA  Forecasts 


Suppose  we  are  forecasting  a  monthly  time  series.  Our  last  observation  is,  say,  for  Feb¬ 
ruary,  and  we  forecast  for  March,  April,  and  May.  As  time  goes  by,  the  actual  value  for 
March  becomes  available.  With  this  new  value  in  hand,  we  would  like  to  update  or 
revise  (and,  one  hopes,  improve)  our  forecasts  for  April  and  May.  Of  course,  we  could 
compute  new  forecasts  from  scratch.  However,  there  is  a  simpler  way. 

For  a  general  forecast  origin  t  and  lead  time  t  +  1,  our  original  forecast  is  denoted 
Y t(C  +  1 ).  Once  the  observation  at  time  t  +  1  becomes  available,  we  would  like  to  update 
our  forecast  as  Yf+  j)^).  Equations  (9.3.35)  and  (9.3.36)  on  page  200  yield 


Y,  +  ui  =  Ct^+  1'>  +  et+ui  +  'viet+e+V2et+e-l  +  +  +  i 


Since  Ct(C+ 1 )  and  et  +  l  are  functions  of  Yt  +  \ ,  Yt, . . . ,  whereas  e,  +  e  +  j ,  et  +  e, . . . ,  et  + 2  are 
independent  of  Yt+  j,  Yt,...,  we  quickly  obtain  the  expression 

F/+l(0  =  CM+  !)  +  V(et+  1 

A  A 

However,  Yt(C+  1)  =  Ct{C+  1),  and,  of  course,  e.  +  ]  =  Yt+l-Y,(  1).  Thus  we  have 
the  general  updating  equation 

Yt+l(C)  =  Yt(e+l)  +  vJLYt+1-Yt(l)]  (9.6.1) 

A 

Notice  that  [  Y t  y  -  Y.(  1 )]  is  the  actual  forecast  error  at  time  t  +  I  once  Yf  +  j  has  been 
observed. 

As  a  numerical  example,  consider  the  industrial  color  property  time  series.  Follow¬ 
ing  Exhibit  9.1  on  page  194,  we  fit  an  AR(1)  model  to  forecast  one  step  ahead  as 
T35(l)  =  70.096  and  two  steps  ahead  as  F35(2)  =  71.86072.  If  now  the  next  color 
value  becomes  available  as  Yt+  1  -  y36  =  65,  then  we  update  the  forecast  for  time  t  =  37 
as 

E,+  1(l)  =  F36(l)  =  71.86072  +  0.5705(65-70.096)  =  68.953452 


9.7  Forecast  Weights  and  Exponentially  Weighted 
Moving  Averages 


For  ARIMA  models  without  moving  average  terms,  it  is  clear  how  the  forecasts  are 
explicitly  determined  from  the  observed  series  Yr,  Yt_  j,...,  Y j.  However,  for  any  model 
with  q>  0,  the  noise  terms  appear  in  the  forecasts,  and  the  nature  of  the  forecasts  explic¬ 
itly  in  terms  of  Yt,  Yt_  _ ,  Fj  is  hidden.  To  bring  out  this  aspect  of  the  forecasts,  we 

return  to  the  inverted  form  of  any  invertible  ARIMA  process,  namely 


Yt  =  *lYt-l+n2Yt-2  +  n3Yt-3  + 


+  e. 


(See  Equation  (4.5.5)  on  page  80.)  Thus  we  can  also  write 
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Yt+  1  -  KlYt  +  K2Yt-l+K3Yt-2  +  +et+l 
Taking  conditional  expectations  of  both  sides,  given  Yt,  Yt_  |,  F],  we  obtain 

Yt(  1)  =  *tV*2y,-l+*3yt-2.*’"  (9-7'1) 

(We  are  assuming  the  f  is  sufficiently  large  and/or  that  the  n-weights  die  out  sufficiently 
quickly  so  that  nt,  nt  +  j,...  are  all  negligible.) 

For  any  invertible  ARIMA  model,  the  ji-weights  can  be  calculated  recursively  from 
the  expressions 


min(j,  q) 

M 

1  ’ 

;  +  tp;-  for  1  <j  <  p  +  d 

i  =  1 

min(j,  q) 

X  0^7- 

j  for  j  >  p  +  d 

i  =  1 


with  initial  value  ji0  =  -1.  (Compare  this  with  Equations  (4.4.7)  on  page  79  for  the 
\|/-weights.) 

Consider  in  particular  the  nonstationary  IMA(1,1)  model 

Yr  =  Yt-l+et-Qet-l 


Here  p  =  0,  d  =  1 ,  cj  =  1 ,  with  tp  [  =  1 ;  thus 


and,  generally. 

Thus  we  have  explicitly 


Tll  =  0Jto  +1  =  1-0 

JIt  =  07tj  =  0(1-0) 

Kj  =  0tty_  1  for  j  >  1 
n  =  (1—0)0/-!  for  j  >  1 


so  that,  from  Equation  (9.7.1),  we  can  write 

Yt(  1)  =  (1  -  6)T,  +  (1  -  6)6  T,_,  +  (1  -  0)02Tr_2  + 


(9.7.3) 


(9.7.4) 


In  this  case,  the  ji-weights  decrease  exponentially,  and  furthermore, 

=(l-0)Z0.-!  =  i5|  =  l 

7=1  7=1  10 

A 

Thus  Y  { 1)  is  called  an  exponentially  weighted  moving  average  (EWMA). 
Simple  algebra  shows  that  we  can  also  write 

Tf(l)  =  (l-0)Ff+0Ff_j(l) 


(9.7.5) 
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and 

Yt(  1)  =  Tf_1(l)  +  (l-e)[Tr-T,_1(l)]  (9.7.6) 

Equations  (9.7.5)  and  (9.7.6)  show  how  to  update  forecasts  from  origin  t  -  1  to  origin  t , 
and  they  express  the  result  as  a  linear  combination  of  the  new  observation  and  the  old 
forecast  or  in  terms  of  the  old  forecast  and  the  last  observed  forecast  error. 

Using  EWMA  to  forecast  time  series  has  been  advocated,  mostly  on  an  ad  hoc 
basis,  for  a  number  of  years;  see  Brown  (1962)  and  Montgomery  and  Johnson  (1976). 

The  parameter  1  -  9  is  called  the  smoothing  constant  in  EWMA  literature,  and  its 
selection  (estimation)  is  often  quite  arbitrary.  From  the  ARIMA  model-building 
approach,  we  let  the  data  indicate  whether  an  IMA(1,1)  model  is  appropriate  for  the 
series  under  consideration.  If  so,  we  then  estimate  9  in  an  efficient  manner  and  compute 
an  EWMA  forecast  that  we  are  confident  is  the  minimum  mean  square  error  forecast.  A 
comprehensive  treatment  of  exponential  smoothing  methods  and  their  relationships  with 
ARIMA  models  is  given  in  Abraham  and  Ledolter  (1983). 

9.8  Forecasting  Transformed  Series 


Differencing 

Suppose  we  are  interested  in  forecasting  a  series  whose  model  involves  a  first  difference 
to  achieve  stationarity.  Two  methods  of  forecasting  can  be  considered: 

1 .  forecasting  the  original  nonstationary  series,  for  example  by  using  the  difference 
equation  form  of  Equation  (9.3.28)  on  page  199,  with  (|>’s  replaced  by  cp’s 
throughout,  or 

2.  forecasting  the  stationary  differenced  series  Wt=  Yt  —  Yt_\  and  then  “undoing” 
the  difference  by  summing  to  obtain  the  forecast  in  original  terms. 

We  shall  show  that  both  methods  lead  to  the  same  forecasts.  This  follows  essentially 
because  differencing  is  a  linear  operation  and  because  conditional  expectation  of  a  lin¬ 
ear  combination  is  the  same  linear  combination  of  the  conditional  expectations. 

Consider  in  particular  the  IMA(1,1)  model.  Basing  our  work  on  the  original  nonsta¬ 
tionary  series,  we  forecast  as 

F,(l)  =  Yt-Qet  (9.8.1) 

and 

Yt(()  =  Yt(e-  1)  for  l  >  1  (9.8.2) 

Consider  now  the  differenced  stationary  MA(1)  series  W,  =  Yt—  Yt_  j.  We  would  fore¬ 
cast  W,  + 1  as 

Wt(  1 )  =  -Qet  (9.8.3) 

and 

Wt(()  =  0  for  e>  1 


(9.8.4) 
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However,  Wr(  1)  =  Yf(  1)  -  Ff;  thus  Wt(  1)  =  —Qef  is  equivalent  to  Ff(l)  =  Ff-  Qef 

A  A  A 

as  before.  Similarly,  11/(7)  =  Yf(C)  -  Yf((—  1),  and  Equation  (9.8.4)  becomes  Equation 
(9.8.2),  as  we  have  claimed. 

The  same  result  would  apply  to  any  model  involving  differences  of  any  order  and 
indeed  to  any  type  of  linear  transformation  with  constant  coefficients.  (Certain  linear 
transformations  other  than  differencing  may  be  applicable  to  seasonal  time  series.  See 
Chapter  10.) 


Log  Transformations 

As  we  saw  earlier,  it  is  frequently  appropriate  to  model  the  logarithms  of  the  original 
series — a  nonlinear  transformation.  Let  Yt  denote  the  original  series  value  and  let  Zf  = 
log(Ff).  It  can  be  shown  that  we  always  have 

E(Yf  JYP  Y,_v  ...,  Fj)  >  exp[£(Z  JZp  Z,_  . . Zj)]  (9.8.5) 

A 

with  equality  holding  only  in  trivial  cases.  Thus,  the  naive  forecast  exp  |  Z;L)]  is  not  the 
minimum  mean  square  error  forecast  of  Yt  +  e.  To  evaluate  the  minimum  mean  square 
error  forecast  in  original  terms,  we  shall  find  the  following  fact  useful:  If  X  has  a  normal 
distribution  with  mean  p  and  variance  ct2  ,  then 


£[exp(X)]  =  exp 


(This  follows,  for  example,  from  the  moment-generating  function  for  X.)  In  our  applica¬ 
tion 

p  =  E(Z^  +  ^\Zt,  Zf  _  j,  ...,  Zj) 

and 


=  VTiHZ  JZ„Z(  ,,  . 

■;ZX) 

=  Var[et(C)  +  Ct(C)\Zt, 

zt_i,  •••>  zx] 

=  Var[et(e)\Zt,Zt_v  . 

..,Zl\  +  Var[Ct(e)\Zt,Zl 

=  Var[et(C)\Zt,Zt_l,  . 

■;Z  J 

=  Var[etU)] 

These  follow  from  Equations  (9.3.35)  and  (9.3.36)  (applied  to  Zf)  and  the  fact  that  C;(/) 
is  a  function  of  Z;,  Z;  _  | , . . . ,  whereas  et(l)  is  independent  ofZ;,  Z;  ],...  .  Thus  the  mini¬ 
mum  mean  square  error  forecast  in  the  original  series  is  given  by 


exp 


Z,U)  +  ~Var[et(e)] 


(9.8.6) 


Throughout  our  discussion  of  forecasting,  we  have  assumed  that  minimum  mean  square 
forecast  error  is  the  criterion  of  choice.  For  normally  distributed  variables,  this  is  an 
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excellent  criterion.  However,  if  Zt  has  a  normal  distribution,  then  Yt  =  exp (Z;)  has  a  log¬ 
normal  distribution,  for  which  a  different  criterion  may  be  desirable.  In  particular,  since 
the  log-normal  distribution  is  asymmetric  and  has  a  long  right  tail,  a  criterion  based  on 
the  mean  absolute  error  may  be  more  appropriate.  For  this  criterion,  the  optimal  forecast 
is  the  median  of  the  distribution  of  Zf+c  conditional  on  Zf,  Zt_  j,...,  Zj.  Since  the  log 
transformation  preserves  medians  and  since,  Jor  a  normal  distribution,  the  mean  and 
median  are  identical,  the  naive  forecast  exp[Z;(f)]  is  the  optimal  forecast  for  Yt  +  e  in 
the  sense  that  it  minimizes  the  mean  absolute  forecast  error. 

9.9  Summary  of  Forecasting  with  Certain  ARIMA  Models 


Here  we  bring  together  various  forecasting  results  for  special  ARIMA  models. 

AR(1):  Yt  =  |a  +  (t)(Ff_1-p)  +  ef 

Yf(C)  =  p  +  §[Yf{C-  1)  -  p]  forf>l 
=  p  +  p)  for/>l 

A 

Yf(C)  «  p  for  large  l 


eM)  =  Vf  +  H+f-t 


+  •••  +  1e 


t+  1 


Var(et(C))  =  a2 


k2fn 


1  —  cb2 


Vcir(et(£))  ~ — c—  =  y0  for  larged 


\| ij  =  (J)-7  for  j  >  0 


MA(1):  Yf  =  p  +  et-Qef 


Yt(  1)  =  |i-8ef 

Yf(e)  =  p  for  l  >  1 
ef(0  =  et+l 


Var(et(()) 


for  l  >  1 


CT2  for  l  =  1 


ct2(1  +  92)  for  t>  1 
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j-9  for  j  =  1 
^  ”  {  0  for  j  >  1 

IMA  (1,1)  with  Constant  Term:  Yt  =  Yt_l+Q0  +  et-Bet_l 

Yt(()  =  Yt(e-l)  +  Q  0-Qet 

=  Yt  +  iQ0-Qet 

Yt(  1)  =  (1  -0)Ff  +  (l  -0)0Tr_1  +  (1  -0)02Ff_2+  •••(theEWMAfor  0O  =  0) 

eM)  =  et  +  i+(1  ~^et  +  e-\  +  7  _0H  +  C-2  +  +  7 -0K+1 

Var(et(C))  =  v2e[  1  +  (e-  1)(  1  -  0)2] 

\| /  •  =  1-9  for  j  >  0 

Note  that  if  0O  ^  0,  the  forecasts  follow  a  straight  line  with  slope  0(),  but  if  0O  =  0,  which 
is  the  usual  case,  then  the  forecast  is  the  same  for  all  lead  times,  namely 

>70  =  Yt-Qet 

IMA(2,2):  Yt  =  2Yf_  j  -  Yf_2  +  0O  +  ef- 0^f_  j  -  Q2ef_2 

Yt(l)  =  2Yt-Yt_1  +  B0-Q1et-Q2et_1  ' 

Yt(2)  =  2Yt(l)-Yt  +  Q0-Q2et  >  (9.9.1) 

Yt(()  =  2Yf{C-  1)  -  Yf{C-2)  +  0Q  for  C>2 

0 

Yt(()  =  A+Bi+^e2  (9.9.2) 

where 

A  =  2Yt(l)-Yt(2)  +  %  (9.9.3) 

and 

B  =  yf(2)-y,(l)-|0o  (9.9.4) 

If  9q  ^  0,  the  forecasts  fodow  a  (quadratic  curve  in  (,  but  if  9q  =  0,  the  forecasts  form  a 
straight  line  with  slope  Yf( 2)  -  T,(l)  and  will  pass  through  the  two  initial  forecasts 
F,(l)  and  F,( 2) .  It  can  be  shown  that  Var(et(i))  is  a  certain  cubic  function  of  l ;  see 
Box,  Jenkins,  and  Reinsel  (1994,  p.  156).  We  also  have 

Vj  =  1  +  02  +  (1  -  0]  -  02);  f°r  j  >  0 


(9.9.5) 
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It  can  also  be  shown  that  forecasting  the  special  case  with  9j  =  2o>  and  02  =  -to  is 
equivalent  to  so-called  double  exponential  smoothing  with  smoothing  constant  1  -  a>; 
see  Abraham  and  Ledolter  (1983). 

9.10  Summary 


Forecasting  or  predicting  future  as  yet  unobserved  values  is  one  of  the  main  reasons  for 
developing  time  series  models.  Methods  discussed  in  this  chapter  are  all  based  on  mini¬ 
mizing  the  mean  square  forecasting  error.  When  the  model  is  simply  deterministic  trend 
plus  zero  mean  white  noise  error,  forecasting  amounts  to  extrapolating  the  trend.  How¬ 
ever,  if  the  model  contains  autocorrelation,  the  forecasts  exploit  the  correlation  to  pro¬ 
duce  better  forecasts  than  would  otherwise  be  obtained.  We  showed  how  to  do  this  with 
ARIMA  models  and  investigated  the  computation  and  properties  of  the  forecasts.  In 
special  cases,  the  computation  and  properties  of  the  forecasts  are  especially  interesting 
and  we  presented  them  separately.  Prediction  limits  are  especially  important  to  assess 
the  potential  accuracy  (or  otherwise)  of  the  forecasts.  Finally,  we  addressed  the  problem 
of  forecasting  time  series  for  which  the  models  involve  transformation  of  the  original 
series. 


Exercises 


9.1  For  an  AR(1)  model  with  Yt=  12.2,  4)  =  -0.5,  and  p  =  10.8, 

(a)  Find  yf(l)A 

(b)  Calculate  Ff( 2)  in  two  different  ways. 

(c)  Calculate  Tf(10). 

9.2  Suppose  that  annual  sales  (in  millions  of  dollars)  of  the  Acme  Corporation  follow 
the  AR(2)  model  Yf  =  5  +  1 . 1  Yf  [  -  0.5  Yf  _1  +  ef  with  af  =  2  . 

(a)  If  sales  for  2005,  2006,  and  2007  were  $9  million,  $11  million,  and  $10  mil¬ 
lion,  respectively,  forecast  sales  for  2008  and  2009. 

(b)  Show  that  \|/ j  =  1.1  for  this  model. 

(c)  Calculate  95%  prediction  limits  for  your  forecast  in  part  (a)  for  2008. 

(d)  If  sales  in  2008  turn  out  to  be  $12  million,  update  your  forecast  for  2009. 

9.3  Using  the  estimated  cosine  trend  on  page  192: 

(a)  Forecast  the  average  monthly  temperature  in  Dubuque,  Iowa,  for  April  1976. 

(b)  Find  a  95%  prediction  interval  for  that  April  forecast.  (The  estimate  of 
for  this  model  is  3.719°F.) 

(c)  What  is  the  forecast  for  April,  1977?  For  April  2009? 

9.4  Using  the  estimated  cosine  trend  on  page  192: 

(a)  Forecast  the  average  monthly  temperature  in  Dubuque,  Iowa,  for  May  1976. 

(b)  Find  a  95%  prediction  interval  for  that  May  1976  forecast.  (The  estimate  of 
Jjq  for  this  model  is  3.719°F.) 
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9.5  Using  the  seasonal  means  model  without  an  intercept  shown  in  Exhibit  3.3  on 
page  32: 

(a)  Forecast  the  average  monthly  temperature  in  Dubuque,  Iowa,  for  April,  1976. 

(b)  Find  a  95%  prediction  interval  for  that  April  forecast.  (The  estimate  of  Jy~0 
for  this  model  is  3.41 9 °F.) 

(c)  Compare  your  forecast  with  the  one  obtained  in  Exercise  9.3. 

(d)  What  is  the  forecast  for  April  1977?  April  2009? 

9.6  Using  the  seasonal  means  model  with  an  intercept  shown  in  Exhibit  3.4  on  page 
33: 

(a)  Forecast  the  average  monthly  temperature  in  Dubuque,  Iowa,  for  April  1976. 

(b)  Find  a  95%  prediction  interval  for  that  April  forecast.  (The  estimate  of  Jy~0 
for  this  model  is  3.41 9 °F.) 

(c)  Compare  your  forecast  with  the  one  obtained  in  Exercise  9.5. 

9.7  Using  the  seasonal  means  model  with  an  intercept  shown  in  Exhibit  3.4  on  page 
33 

(a)  Forecast  the  average  monthly  temperature  in  Dubuque,  Iowa,  for  January 
1976. 

(b)  Find  a  95%  prediction  interval  for  that  January  forecast.  (The  estimate  of  Jy~0 
for  this  model  is  3.41 9 °F.) 

9.8  Consider  the  monthly  electricity  generation  time  series  shown  in  Exhibit  5.8  on 
page  99.  The  data  are  in  the  file  named  electricity. 

(a)  Fit  a  deterministic  trend  model  containing  seasonal  means  together  with  a  lin¬ 
ear  time  trend  to  the  logarithms  of  the  electricity  values. 

(b)  Plot  the  last  five  years  of  the  series  together  with  two  years  of  forecasts  and 
the  95%  forecast  limits.  Interpret  the  plot. 

9.9  Simulate  an  AR(1)  process  with  (j>  =  0.8  and  p  =  100.  Simulate  48  values  but  set 
aside  the  last  8  values  to  compare  forecasts  to  actual  values. 

(a)  Using  the  first  40  values  of  the  series,  find  the  values  for  the  maximum  likeli¬ 
hood  estimates  of  cj)  and  p. 

(b)  Using  the  estimated  model,  forecast  the  next  eight  values  of  the  series.  Plot 
the  series  together  with  the  eight  forecasts.  Place  a  horizontal  line  at  the  esti¬ 
mate  of  the  process  mean. 

(c)  Compare  the  eight  forecasts  with  the  actual  values  that  you  set  aside. 

(d)  Plot  the  forecasts  together  with  95%  forecast  limits.  Do  the  actual  values  fall 
within  the  forecast  limits? 

(e)  Repeat  parts  (a)  through  (d)  with  a  new  simulated  series  using  the  same  values 
of  the  parameters  and  the  same  sample  size, 

9.10  Simulate  an  AR(2)  process  with  (Jq  =  1.5,  (jq  =  -0.75,  and  p  =  100.  Simulate  52 
values  but  set  aside  the  last  12  values  to  compare  forecasts  to  actual  values. 

(a)  Using  the  first  40  values  of  the  series,  find  the  values  for  the  maximum  likeli¬ 
hood  estimates  of  the  (jfs  and  p. 

(b)  Using  the  estimated  model,  forecast  the  next  12  values  of  the  series.  Plot  the 
series  together  with  the  12  forecasts.  Place  a  horizontal  line  at  the  estimate  of 
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the  process  mean. 

(c)  Compare  the  12  forecasts  with  the  actual  values  that  you  set  aside. 

(d)  Plot  the  forecasts  together  with  95%  forecast  limits.  Do  the  actual  values  fall 
within  the  forecast  limits? 

(e)  Repeat  parts  (a)  through  (d)  with  a  new  simulated  series  using  the  same  values 
of  the  parameters  and  same  sample  size. 

9.11  Simulate  an  MA(1)  process  with  0  =  0.6  and  p  =  100.  Simulate  36  values  but  set 

aside  the  last  4  values  to  compare  forecasts  to  actual  values. 

(a)  Using  the  first  32  values  of  the  series,  find  the  values  for  the  maximum  likeli¬ 
hood  estimates  of  the  0  and  p. 

(b)  Using  the  estimated  model,  forecast  the  next  four  values  of  the  series.  Plot  the 
series  together  with  the  four  forecasts.  Place  a  horizontal  line  at  the  estimate 
of  the  process  mean. 

(c)  Compare  the  four  forecasts  with  the  actual  values  that  you  set  aside. 

(d)  Plot  the  forecasts  together  with  95%  forecast  limits.  Do  the  actual  values  fall 
within  the  forecast  limits? 

(e)  Repeat  parts  (a)  through  (d)  with  a  new  simulated  series  using  the  same  values 
of  the  parameters  and  same  sample  size. 

9.12  Simulate  an  MA(2)  process  with  0j  =  1,  02  =  -0.6,  and  p  =  100.  Simulate  36  val¬ 
ues  but  set  aside  the  last  4  values  with  compare  forecasts  to  actual  values. 

(a)  Using  the  first  32  values  of  the  series,  find  the  values  for  the  maximum  likeli¬ 
hood  estimates  of  the  0’s  and  p. 

(b)  Using  the  estimated  model,  forecast  the  next  four  values  of  the  series.  Plot  the 
series  together  with  the  four  forecasts.  Place  a  horizontal  line  at  the  estimate 
of  the  process  mean. 

(c)  What  is  special  about  the  forecasts  at  lead  times  3  and  4? 

(d)  Compare  the  four  forecasts  with  the  actual  values  that  you  set  aside. 

(e)  Plot  the  forecasts  together  with  95%  forecast  limits.  Do  the  actual  values  fall 
within  the  forecast  limits? 

(f)  Repeat  parts  (a)  through  (e)  with  a  new  simulated  series  using  the  same  values 
of  the  parameters  and  same  sample  size. 

9.13  Simulate  an  ARMA(1,1)  process  with  <|>  =  0.7,  0  =  -0.5,  and  p  =  100.  Simulate  50 

values  but  set  aside  the  last  10  values  to  compare  forecasts  with  actual  values. 

(a)  Using  the  first  40  values  of  the  series,  find  the  values  for  the  maximum  likeli¬ 
hood  estimates  of  <|>,  0,  and  p. 

(b)  Using  the  estimated  model,  forecast  the  next  ten  values  of  the  series.  Plot  the 
series  together  with  the  ten  forecasts.  Place  a  horizontal  line  at  the  estimate  of 
the  process  mean. 

(c)  Compare  the  ten  forecasts  with  the  actual  values  that  you  set  aside. 

(d)  Plot  the  forecasts  together  with  95%  forecast  limits.  Do  the  actual  values  fall 
within  the  forecast  limits? 

(e)  Repeat  parts  (a)  through  (d)  with  a  new  simulated  series  using  the  same  values 
of  the  parameters  and  same  sample  size. 
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9.14  Simulate  an  IMA(1,1)  process  with  9  =  0.8  and  9()  =  0.  Simulate  35  values,  but  set 
aside  the  last  five  values  to  compare  forecasts  with  actual  values. 

(a)  Using  the  first  30  values  of  the  series,  find  the  value  for  the  maximum  likeli¬ 
hood  estimate  of  9. 

(b)  Using  the  estimated  model,  forecast  the  next  five  values  of  the  series.  Plot  the 
series  together  with  the  five  forecasts.  What  is  special  about  the  forecasts? 

(c)  Compare  the  five  forecasts  with  the  actual  values  that  you  set  aside, 

(d)  Plot  the  forecasts  together  with  95%  forecast  limits.  Do  the  actual  values  fall 
within  the  forecast  limits? 

(e)  Repeat  parts  (a)  through  (d)  with  a  new  simulated  series  using  the  same  values 
of  the  parameters  and  same  sample  size. 

9.15  Simulate  an  IMA(1,1)  process  with  9  =  0.8  and  9q  =  10.  Simulate  35  values,  but 
set  aside  the  last  five  values  to  compare  forecasts  to  actual  values. 

(a)  Using  the  first  30  values  of  the  series,  find  the  values  for  the  maximum  likeli¬ 
hood  estimates  of  9  and  9q. 

(b)  Using  the  estimated  model,  forecast  the  next  five  values  of  the  series.  Plot  the 
series  together  with  the  five  forecasts.  What  is  special  about  these  forecasts? 

(c)  Compare  the  five  forecasts  with  the  actual  values  that  you  set  aside, 

(d)  Plot  the  forecasts  together  with  95%  forecast  limits.  Do  the  actual  values  fall 
within  the  forecast  limits? 

(e)  Repeat  parts  (a)  through  (d)  with  a  new  simulated  series  using  the  same  values 
of  the  parameters  and  same  sample  size. 

9.16  Simulate  an  IMA(2,2)  process  with  9]  =  1,  92  =  -0.75,  and  90  =  0.  Simulate  45 
values,  but  set  aside  the  last  five  values  to  compare  forecasts  with  actual  values. 

(a)  Using  the  first  40  values  of  the  series,  find  the  value  for  the  maximum  likeli¬ 
hood  estimate  of  9j  and  92- 

(b)  Using  the  estimated  model,  forecast  the  next  five  values  of  the  series.  Plot  the 
series  together  with  the  five  forecasts.  What  is  special  about  the  forecasts? 

(c)  Compare  the  five  forecasts  with  the  actual  values  that  you  set  aside, 

(d)  Plot  the  forecasts  together  with  95%  forecast  limits.  Do  the  actual  values  fall 
within  the  forecast  limits? 

(e)  Repeat  parts  (a)  through  (d)  with  a  new  simulated  series  using  the  same  values 
of  the  parameters  and  same  sample  size. 

9.17  Simulate  an  IMA(2,2)  process  with  9j  =  1,  92  =  -0.75,  and  9q  =  10.  Simulate  45 
values,  but  set  aside  the  last  five  values  to  compare  forecasts  with  actual  values. 

(a)  Using  the  first  40  values  of  the  series,  find  the  values  for  the  maximum  likeli¬ 
hood  estimates  of  9j,  92,  and  0O. 

(b)  Using  the  estimated  model,  forecast  the  next  five  values  of  the  series.  Plot  the 
series  together  with  the  five  forecasts.  What  is  special  about  these  forecasts? 

(c)  Compare  the  five  forecasts  with  the  actual  values  that  you  set  aside, 

(d)  Plot  the  forecasts  together  with  95%  forecast  limits.  Do  the  actual  values  fall 
within  the  forecast  limits? 

(e)  Repeat  parts  (a)  through  (d)  with  a  new  simulated  series  using  the  same  values 
of  the  parameters  and  same  sample  size. 
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9.18  Consider  the  model  Yt  =  p()  +  Pjt  +  ,  where  Xt  =  §Xf_  j  +  e( .  We  assume 

that  Po-  Pi'  and  (|)  are  known.  Show  that  the  minimum  mean  square  error  forecast  t 
steps  ahead  can  be  written  as  Yf(()  =  P()  +  Pj(f  +  ()  +  -  P0  -  P  [t) . 

9.19  Verify  Equation  (9.3.16)  on  page  196. 

9.20  Verify  Equation  (9.3.32)  on  page  200. 

9.21  The  data  file  named  deere3  contains  57  consecutive  values  from  a  complex 
machine  tool  process  at  Deere  &  Co.  The  values  given  are  deviations  from  a  tar¬ 
get  value  in  units  of  ten  millionths  of  an  inch.  The  process  employs  a  control 
mechanism  that  resets  some  of  the  parameters  of  the  machine  tool  depending  on 
the  magnitude  of  deviation  from  target  of  the  last  item  produced. 

(a)  Using  an  AR(1)  model  for  this  series,  forecast  the  next  ten  values. 

(b)  Plot  the  series,  the  forecasts,  and  95%  forecast  limits,  and  interpret  the  results. 

9.22  The  data  file  named  days  contains  accounting  data  from  the  Winegard  Co.  of  Bur¬ 
lington,  Iowa.  The  data  are  the  number  of  days  until  Winegard  receives  payment 
for  130  consecutive  orders  from  a  particular  distributor  of  Winegard  products. 
(The  name  of  the  distributor  must  remain  anonymous  for  confidentiality  reasons.) 
The  time  series  contains  outliers  that  are  quite  obvious  in  the  time  series  plot. 
Replace  each  of  the  unusual  values  at  “times”  63,  106,  and  129  with  the  much 
more  typical  value  of  35  days. 

(a)  Use  an  MA(2)  model  to  forecast  the  next  ten  values  of  this  modified  series. 

(b)  Plot  the  series,  the  forecasts,  and  95%  forecast  limits,  and  interpret  the  results. 

9.23  The  time  series  in  the  data  file  robot  gives  the  final  position  in  the  “x-direction” 
after  an  industrial  robot  has  finished  a  planned  set  of  exercises.  The  measure¬ 
ments  are  expressed  as  deviations  from  a  target  position.  The  robot  is  put  through 
this  planned  set  of  exercises  in  the  hope  that  its  behavior  is  repeatable  and  thus 
predictable, 

(a)  Use  an  IMA(1,1)  model  to  forecast  five  values  ahead.  Obtain  95%  forecast 
limits  also. 

(b)  Display  the  forecasts,  forecast  limits,  and  actual  values  in  a  graph  and  inter¬ 
pret  the  results. 

(c)  Now  use  an  ARM A(  1,1)  model  to  forecast  five  values  ahead  and  obtain  95% 
forecast  limits.  Compare  these  results  with  those  obtained  in  part  (a). 

9.24  Exhibit  9.4  on  page  206  displayed  the  forecasts  and  95%  forecast  limits  for  the 
square  root  of  the  Canadian  hare  abundance.  The  data  are  in  the  file  named  hare. 
Produce  a  similar  plot  in  original  terms.  That  is,  plot  the  original  abundance  val¬ 
ues  together  with  the  squares  of  the  forecasts  and  squares  of  the  forecast  limits. 

9.25  Consider  the  seasonal  means  plus  linear  time  trend  model  for  the  logarithms  of 
the  monthly  electricity  generation  time  series  in  Exercise  9.8.  (The  data  are  in  the 
file  named  electricity.) 

(a)  Find  the  two-year  forecasts  and  forecast  limits  in  original  terms.  That  is,  expo¬ 
nentiate  (antilog)  the  results  obtained  in  Exercise  9.8. 

(b)  Plot  the  last  five  years  of  the  original  time  series  together  with  two  years  of 
forecasts  and  the  95%  forecast  limits,  all  in  original  terms.  Interpret  the  plot. 
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Appendix  E:  Conditional  Expectation 


If  X  and  Y  have  joint  pdf/(x,y)  and  we  denote  the  marginal  pdf  of  X  by  f(x),  then  the 
conditional  pdf  of  Y  given  X  =  .r  is  given  by 

yow  - 

fix) 

For  a  given  value  of  x,  the  conditional  pdf  has  all  of  the  usual  properties  of  a  pdf.  In  par¬ 
ticular,  the  conditional  expectation  of  Y  given  X  =  x  is  defined  as 


E(Y \X=x)  =  l_lyf(y\x)dy 

As  an  expected  value  or  mean,  the  conditional  expectation  of  Y  given  X  =  x  has  all  of 
the  usual  properties.  For  example, 

E(aY+bZ+c\X=x)  =  aE(Y\X=x)  +  bE(Z\X=x)  +  c  (9.E.1) 

and 

E[h(Y)\X  =  x]  =  \_2yf(y\x)dx  (9.E.2) 


In  addition,  several  new  properties  arise: 

E[h(X) \X=x]  =  h(x)  (9.E.3) 

That  is,  given  X  =  x,  the  random  variable  h(X)  can  be  treated  like  a  constant  h(x).  More 
generally, 

E[h(X,Y) \X=x]  =  E(h(x,Y) \X=x)  (9.E.4) 

If  we  set  E(Y\X=x)  =  g(x),  then  g{X)  is  a  random  variable  and  we  can  consider 
E[g(X)\.  It  can  be  shown  that 

E[g(A)]  =  E(Y) 

which  is  often  written  as 

E[E(Y\X)]  =  E(Y)  (9.E.5) 

If  Y  and  X  are  independent,  then 

E(Y\X)  =  E(Y)  (9.E.6) 


Appendix  F:  Minimum  Mean  Square  Error  Prediction 


Suppose  Y  is  a  random  variable  with  mean  pj-  and  variance  Gy .  If  our  object  is  to  pre¬ 
dict  Y  using  only  a  constant  c,  what  is  the  best  choice  for  c?  Clearly,  we  must  first  define 
best.  A  common  (and  convenient)  criterion  is  to  choose  c  to  minimize  the  mean  square 
error  of  prediction,  that  is,  to  minimize 

8(c)  =  £[(T-c)2] 
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If  we  expand  g(c),  we  have 

g(c)  =  E{Y2)-2cE(Y)  +  c2 

Since  g(c)  is  quadratic  in  c  and  opens  upward,  solving  g'(c)  =  Owill  produce  the 
required  minimum.  We  have 

g'(c)  =  -2E(Y)  +  2c 


so  that  the  optimal  c  is 

c  =  E(Y)  =  |i 

Note  also  that 

min  g(c)  =  E(Y-\i)2  =  as 

—00  <  c  <  00  1 


(9.F.1) 

(9.F.2) 


Now  consider  the  situation  where  a  second  random  variable  X  is  available  and  we 
wish  to  use  the  observed  value  of  X  to  help  predict  Y.  Let  p  =  Corr  (X,  Y).  We  first  sup¬ 
pose,  for  simplicity,  that  only  linear  functions  a  +  bX  can  be  used  for  the  prediction.  The 
mean  square  error  is  then  given  by 

g(a,b)  =  E(Y-a-bX)2 

and  expanding  we  gave 

g(a,b)  =  E(Y2)  +  a2  +  b2E(X2)-2aE(Y)  +  2abE(X)-2bE(XY) 

This  is  also  quadratic  in  a  and  b  and  opens  upward.  Thus  we  can  find  the  point  of  mini¬ 
mum  by  solving  simultaneous  linear  equations  dg(a,  b)/aa  =  0  and  dg(a,  b)/db  =  0. 
We  have 

dg(a,  b)/8a  =  2a  -  2E(Y)  +  2bE(X)  =  0 
dg(a,  b)/db  =  2bE{X2)  +  2aE{X)-2E{XY)  =  0 
which  we  rewrite  as 

a  +  E(X)b  =  E(Y ) 

E(X)a  +  E(X2)b  =  EXY 

Multiplying  the  first  equation  by  E(X)  and  subtracting  yields 


b  -  E(XY)-E(X)E(Y)  _  Cov(X,Y )  _  (9F3) 

E(X2)-[E(X )]2  Var{X)  Fctx 

Then 

a  =  E(Y)  -  bE(X)  =  |iy-p  —  (9.F.4) 

°X 

A 

If  we  let  Y  be  the  minimum  mean  square  error  prediction  of  Y  based  on  a  linear 
function  of  X ,  then  we  can  write 


r  i 

r i 

py-p 

L  ax  J 

+ 

p-Fx 

L  °x  J 

(9.F.5) 


or 
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T  -  (Ty 

-x  - 

=  P 

-  a7  - 

-  ax  - 

In  terms  of  standardized  variables  Y  and  X  ,  we  have  simply  Y  =  pX  . 
Also,  using  Equations  (9.F.3)  and  (9.F.4),  we  find 


min  g(a,b)  =  ctp(l-p2) 


(9.F.6) 


(9.F.7) 


which  provides  a  proof  that  -1  <  p  <  +1  since  g(a,b)  >  0. 

If  we  compare  Equation  (9.F.7)  with  Equation  (9.F.2),  we  see  that  the  minimum 
mean  square  error  obtained  when  we  use  a  linear  function  of  X  to  predict  Y  is  reduced  by 
a  factor  of  1  -  p  compared  with  that  obtained  by  ignoring  X  and  simply  using  the  con¬ 
stant  |iy  for  our  prediction. 

Let  us  now  consider  the  more  general  problem  of  predicting  Y  with  an  arbitrary 
function  of  X.  Once  more  our  criterion  will  be  to  minimize  the  mean  square  error  of  pre¬ 
diction.  We  need  to  choose  the  function  h(X),  say,  that  minimizes 

E[Y-h(X)]2  (9.F.8) 

Using  Equation  (9.E.5),  we  can  write  this  as 

E[Y-h(X)]2  =  E(E{[Y-h(X)]2\X})  (9.F.9) 

Using  Equation  (9.E.4),  the  inner  expectation  can  be  written  as 

E{[Y-h(X)]2  \X  =  x}  =  E{[Y-h(x)]2\X  =  x}  (9.F.10) 

For  each  value  of  x ,  h(x)  is  a  constant,  and  we  can  apply  the  result  of  Equation  (9.F.1)  to 
the  conditional  distribution  of  Y  given  X  =  x.  Thus,  for  each  x,  the  best  choice  of  h(x)  is 

h(x)  =  E(Y \X  =  x)  (9.F.11) 

Since  this  choice  of  h(x)  minimizes  the  inner  expectation  in  Equation  (9.F.9),  it  must 
also  provide  the  overall  minimum  of  Equation  (9.F.8).  Thus 

h(X)  =  E(Y\X )  (9.F.12) 

is  the  best  predictor  of  Y  of  all  functions  of  X. 

If  X  and  Y  have  a  bivariate  normal  distribution,  it  is  well-known  that 


E(Y\X)  =  py+  p^(X-nx) 

ax 

so  that  the  solutions  given  in  Equations  (9.F.12)  and  (9.F.5)  coincide.  In  this  case,  the 
linear  predictor  is  the  best  of  all  functions. 

More  generally,  if  Y is  to  be  predicted  by  a  function  of  Zj,  Z2,...,  Xn,  then  it  can  be 
easily  argued  that  the  minimum  square  error  predictor  is  given  by 

E(Y\XvX2,  ... ,xn: ) 


(9.F.13) 
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Appendix  G:  The  Truncated  Linear  Process 


Suppose  { Yt }  satisfies  the  general  ARIMA(p,r/,g)  model  with  AR  characteristic  polyno¬ 
mial  ()>(jc),  MA  characteristic  polynomial  9(x),  and  constant  term  9q.  Then  the  truncated 
linear  process  representation  for  { Yt}  is  given  by 


where 


Yt  +  l=  Ct(c)  +  It(c)  for  f  >  1 


iM) 

e-i 

= 

■~j 

for  l  >  1 

j  =  o 

d 

r 

Pi  - 1 

ct{c) 

.+ 

W 

ii 

X 

X*, 

i  =  0 

i  =  i 

j  =  0  ; 

(9.G.1) 


(9.G.2) 

(9.G.3) 


and  Aj,  B/j,  i  =  1,  2,...,  r,  j  =  1,  2,...,  pr  are  constant  in  l  and  depend  only  on  Yp 
Yt_\,...  As  always,  the  \|/-weights  are  defined  by  the  identity 

4>(.v)(  1  -x)d(l  +  ij/jX  +  \|/2x2  +  •••)  =  9(x)  (9.G.4) 


or 

(p(x)(l  +  ij/jX  +  \|/2x2  +  •••)  =  9(x)  (9.G.5) 

We  shall  show  that  the  representation  given  by  Equation  (9.G.1)  is  valid  by  arguing 
that,  for  fixed  t,  Cil)  is  essentially  the  complementary  function  of  the  defining  differ¬ 
ence  equation,  that  is, 


C,(0-q>1Cf(<-l)-(p2Cf(<-2) - 9 p  +  dCt(C-P-d)  =  90  for^>9  (9.G.6) 

and  that  I  (e)  is  a  particular  solution  (without  90): 


^f(0-9 MM~  l)-92 iAc~2) - 9p 

=  et  +  e~®iet+i-i  ~®2et+i-2 


+  dItU-p-d) 

Qqet  +  l-q  ioU>q 


(9.G.7) 


Since  C';(f)  contains  p  +  d  arbitrary  constants  (the  A’s  and  the  B' s),  summing  Ct(i)  and 
L{()  yields  the  general  solution  of  the  ARIMA  equation.  Specific  values  for  the  A’s  and 
B’s  will  be  determined  by  initial  conditions  on  the  {  Yt  \  process. 

We  note  that  Ad  is  not  arbitrary.  We  have 


Ad  = 


(1  -<t>i-<t>2 - <t \>p)d'- 


(9.G.8) 


The  proof  that  Ct(C)  as  given  by  Equation  (9.G.2)  is  the  complementary  function  and 
satisfies  Equation  (9.G.6)  is  a  standard  result  from  the  theory  of  difference  equations 


'  The  only  property  of  the  Ct(C)  that  we  need  is  that  it  depends  only  on  Yt,  Yt_  1(...  . 
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(see,  for  example,  Goldberg,  1958).  We  shall  show  that  the  particular  solution  I.(t) 
defined  by  Equation  (9.G.2)  does  satisfy  Equation  (9.G.7). 

For  convenience  of  notation,  we  let  tp j  =  0  for  j  >  p  +  d.  Consider  the  left-hand  side 
of  Equation  (9.G.7).  It  can  be  written  as: 


(Voe,  +  e+  t  +  '"  +  W 1) -<Pt(Vo«,+^_ ,  +  VtV-2  +  '" 

+  V(~2et+ 1) - (PP  +  d('Yoet+e-p-d  ■ 

+  Vi*t  +  e-P-d-i  +  -+Ve-P-d-iet+i\ 


(9.G.9) 


Now  grouping  together  common  et  terms  and  picking  off  their  coefficients,  we  obtain 


Coefficient  of  et  +  t  _  j 

Vo 

Coefficient  of  et  +  t  _  2 

V1-V1V0 

Coefficient  of  et  +  t  _  3 

V2-V1V1-V2V0 

Coefficient  of  et  +  i 

v^1-viv^2-v2v^3-- 

1 

+ 

1 

’■'3 

1 

S3. 

1 

If  l  >  q,  we  can  match  these  coefficients  to  the  corresponding  coefficients  on  the 
right-hand  side  of  Equation  (9.G.7)  to  obtain  the  relationships 


Vo  =  1 

Vi  -<PiV0  =  “ei 
V2-V1V1  -92V0  =  “e2 


(9.G.10) 


V9-9lV(?rta-(p2V9_2 - Vo  =  -0,y 

Vi-<PiV2-<P2V3 - Vp  +  dVt-p-d- 1  =  0  for  c>q- 


However,  by  comparing  these  relationships  with  Equation  (9.G.5),  we  see  that  Equa¬ 
tions  (9.G.10)  are  precisely  the  equations  defining  the  \p-weights  and  thus  Equation 
(9.G.7)  is  established  as  required. 
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Control  theory  engineers  have  developed  and  successfully  used  so-called  state  space 
models  and  Kalman  filtering  since  Kalman  published  his  seminal  work  in  1960. 
Recent  references  include  Durbin  and  Koopman  (2001)  and  Harvey  et  al.  (2004). 

Consider  a  general  stationary  and  invertible  ARMA(p,i7)  process  {Zt}.  Put  m  = 
max(p,  q  +  1)  and  define  the  state  of  the  process  at  time  t  as  the  column  vector  7(t)  of 
length  m  whose  yth  element  is  the  forecast  Z(j)  for  j  =  0,  1,2 m—  1,  based  on  Zt, 
Zt  _  j,...  .  Note  that  the  lead  element  of  Z(t)  is  just  Z(0)  =  Z;. 

Recall  the  updating  Equation  (9.6.1)  on  page  207,  which  in  the  present  context  can 


Appendix  H:  State  Space  Models 


223 


be  written 

Zt+l(e)  =  Zt(e+  l)+v(et+t  (9.H.1) 


We  shall  use  this  expression  directly  for  t  =  0,  1 ,  2, 2.  For  C  =  m-  1 ,  we  have 

Zt+1(m-l)  =  Zr(m)  +  \ Vm_1et+1 

A  A  A  (9.H.2) 

=  §\Zt(m-  1)  +  4)2Zt(m-2)  +  ...  +§pZt(m-p)  +  \\im  _let+  , 


where  the  last  expression  comes  from  Equation  (9.3.34)  on  page  200,  with  p  =  0. 

The  matrix  formulation  of  Equations  (9.H.1)  and  (9.FI.2)  relating  Z(t  +  1)  to  Z(f) 
and  e  j,  called  the  equations  of  state  (or  Akaike’s  Markovian  representation),  is 

given  as 

Z(r+  1)  =  FZ(t)  +  Get+l  (9.H.3) 


where 


and 


0  10  0 
0  0  10 


0 

0 

0 


0  0  0  0  ...  1 

4>m  in  - 1  4»i 


1 

Vl 

V2 


Vm-  i 


(9.H.4) 


(9.H.5) 


with  (j) .  =  0  for  j  >  p.  Note  that  the  simplicity  of  Equation  (9.H.3)  is  obtained  at  the 
expense  of  having  to  deal  with  vector-valued  processes.  Because  the  state  space  formu¬ 
lation  also  usually  allows  for  measurement  error,  we  do  not  observe  Zf  directly  but  only 
observe  Y,  through  the  observational  equation 

Yf  =  HZ(t)  +  st  (9.H.6) 


where  H  =  [1,  0,  0,...,  0]  and  )  a;  j  is  another  zero-mean  white  noise  process  indepen¬ 
dent  of  {et}.  The  special  case  of  no  measurement  error  is  obtained  by  setting  e  =  0  in 

^  9  ^ 

Equation  (9.H.6).  Equivalently,  this  case  is  obtained  by  taking  a~  =  0  in  subsequent 
equations.  More  general  state  space  models  allow  F,  G,  and  H  to  be  more  general,  pos¬ 
sibly  also  depending  on  time. 
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Evaluation  of  the  Likelihood  Function  and  Kalman  Filtering 

First  a  definition:  The  covariance  matrix  for  a  vector  of  random  variables  X  of  dimen¬ 
sion  n  x  I  is  defined  to  be  the  n  x n  matrix  whose  i/th  element  is  the  covariance  between 
the  ith  and  /th  components  of  X. 

If  Y  =  AX  +  B ,  then  it  is  easily  shown  that  the  covariance  matrix  for  Y  is  AVA  , 
where  V  is  the  covariance  matrix  for  X  and  the  superscript  T  denotes  matrix  transpose. 

Getting  back  to  the  Kalman  filter,  we  let  7(t  +  1 1 r)  denote  the  zzzx  1  vector  whose 
/th  component  is  E[ZJ+l(j)\Yt,Ytl,  ...,YX\  for  j  =  0,  1,2  1 .  Similarly,  let 

Z(t\t)  be  the  vector  whose  jth  component  is  E[Z  (J)\Y ,  Y  j,  Y  j]  for  j  =  0,  1, 

2,...,  m  -  1. 

Then,  since  et+  t  is  independent  of  Zf,  Zt _  j,...,  and  hence  also  of  Yt,  Yt_  j, _ ,  we 

see  from  Equation  (9. H. 3)  that 

Z(t+  l|f)  =  FZ(t\t)  (9.H.7) 

Also  letting  P(t  +  l|r)  be  the  covariance  matrix  for  the  “forecast  error”  Z(t  +  1)- 
Z(t  +  l|f)  and  ,P(r|f)  be  the  covariance  matrix  for  the  “forecast  error”  Z(t)  -Z(t\t) , 
we  have  from  Equation  (9.FI.3)  that 

P(t+  l|f)  =  F[P(t\t)]FT  +  oe2GG7  (9.H.8) 

From  the  observational  equation  (Equation  (9.H.6))  and  then  replacing  t  +  1  by  t, 

Y(t+l\t)  =  HZ(t+  1|0  (9.H.9) 

where  Y(t+l\t)  =  E(Yf+  Y\Yp  Y, _  p  Tj) . 

It  can  now  be  shown  that  the  following  relationships  hold  (see,  for  example,  Har¬ 
vey,  1981c): 

Z(f+  1|  t+  1)  =  Z(t+  l\t)+K(t+  l)[Yt+l-Y(t+  l|t)]  (9.H.10) 

where 

K(t+  1)  =  P(t+  l\t)Hr[HP(t+  \\t)HT+olYl  (9.H.11) 

and 

P(t+  l|(r+  1))  =  P(t+  l\t)-K(t+  1  )HP(t+  l|r)  (9.H.12) 

Collectively,  Equations  (9.H.10),  (9.H.11),  and  (9.H.12)  are  referred  to  as  the  Kalman 
filter  equations.  The  quantity 

errf+1  =  Yt+l-Y(t+l\t)  (9.H.13) 

in  Equation  (9.H.10)  is  the  prediction  error  and  is  independent  of  (or  at  least  uncorre¬ 
lated  with)  the  past  observations  Yt,  Yt_  j,...  .  Since  we  are  allowing  for  measurement 
error,  errf  |  is  not,  in  general,  the  same  as  et  +  j. 

From  Equations  (9.H.13)  and  (9.H.6),  we  have 

vf+1  =  Var(errt+l)  =  HP(t  +  l\t)H7  + 


(9.H.14) 
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Now  consider  the  likelihood  function  for  the  observed  series  Y j,  Y2,...,  Yn.  From  the 
definition  of  the  conditional  probability  density  function,  we  can  write 

f(yvy2, yn)  =  f(yn\yvy2,  ■■■,yn_l)f(yvy1,  —,yn- x) 

or,  by  taking  logs, 

log f(yv  y2,  yn)  =  ^gf(yv  y2, ■  yn_l)  +  logf(yn\yv  y2,  ...,  yn  ,)  (9.H.15) 

Assume  now  that  we  are  dealing  with  normal  distributions,  that  is,  that  {et}  and 
{sf}  are  normal  white  noise  processes.  Then  it  is  known  that  the  distribution  of  Yn  con¬ 
ditional  on  T|  =  vj,  K2  =  y 2,- Yn  _  \  =  yn  _  j,  is  also  normal  with  mean  y{n\n  -  1 )  and 
variance  vn.  In  the  remainder  of  this  section  and  the  next,  we  write  y(n\n  -  1 )  for  the 
observed  value  of  Y(n\n  -  1)  .The  second  term  on  the  right-hand  side  of  Equation 
(9.H.15)  can  then  be  written 


log f(yn\yi,yv—,yn^\)  =  -  2\ogin  - -iogv„ 


\[yn-y{n\n-  1)]: 


Furthermore,  the  first  term  on  the  right-hand  side  of  Equation  (9.H.15)  can  be 
decomposed  similarly  again  and  again  until  we  have 

log/Cvj, y2, yn)  =  y,  l°sf(yt\yi’ y?  •••> yt- 1) +  log/i>’i)  (9.H.16) 

t= 2 

which  then  becomes  the  prediction  error  decomposition  of  the  likelihood,  namely 

,  „  .  n,  -  1  "  1  "  [yt-y(t\t- 1)]2  mTT1_, 

log f(yvy2,  -,yn)  =  -2lo£2n-iX  VjE - - -  (9.H.17) 

z  1  t=  1  vt 


with  v ( 1 1 0 )  =  Oand  vj  =  Var(Y\). 

The  overall  strategy  for  computing  the  likelihood  for  a  given  set  of  parameter  val¬ 
ues  is  to  use  the  Kalman  filter  equations  to  generate  recursively  the  prediction  errors  and 
their  variances  and  then  use  the  prediction  error  decomposition  of  the  likelihood  func¬ 
tion.  Only  one  point  remains:  We  need  initial  values  Z(0|0)  and  P( 0 1 0 )  to  get  the  recur¬ 
sions  started. 

The  Initial  State  Covariance  Matrix 

The  initial  state  vector  Z(0|0)  will  be  a  vector  of  zeros  for  a  zero-mean  process,  and 
P(0|0)  is  the  covariance  matrix  for  Z(0)  -  Zj^0|0)  =  Z( 0) .  Now,  because  Z(0)  is  the 
column  vector  with  elements  [Z(),  ZQ(  1 ),  ....  Z()(/«  -  1 )],  it  is  necessary  for  us  to  evalu¬ 
ate 

Cov[Z0(i),  ZQ0')]  for  i,j  =  0,  1,  ...,  m  -  1 

From  the  truncated  linear  process  form.  Equation  (9.3.35)  on  page  200  with  Cf(t) 
=  Z  (() ,  we  may  write,  for  j  >  0 
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Zj  =  Z0(j)+  X  Vj  +  ke-k  (9.H.18) 

k  =  -j 

Multiplying  Equation  (9.H.18)  by  Z0  and  taking  expected  values  yields 

y.  =  E(Z0Zj)  =  E[Z0(0)(Z0(j))\  for./>0  (9.H.19) 

Now  multiply  Equation  (9.H.18)  by  itself  with  j  replaced  by  i  and  take  expected  values. 
Recalling  that  the  e’s  are  independent  of  past  Z’s  and  assuming  0  <i<  j,  we  obtain 

y =  Cov[Z0(i),  Z00)]  +  ol  £  wkVfk+j-i  (9.H.20) 

k  =  0 


Combining  Equations  (9.H.19)  and  (9.H.20),  we  have  as  the  required  elements  of 

P(0|0) 


0  =  i  <j  <  m  -  1 


Cov[ZQ(i),  Z00')] 


U-i~Ge  Z  Wt+j-i 


k  =  0 


(9.H.21) 

1  <  i  <  j  <  m  -  1 


where  the  \|/-weights  are  obtained  from  the  recursion  of  Equation  (4.4.7)  on  page  79, 
and  y£,  the  autocovariance  function  for  the  { Z; }  process,  is  obtained  as  in  Appendix  C 
on  page  85. 

The  variance  o(“  can  be  removed  from  the  problem  by  dividing  o-  by  a~  .  The 
prediction  error  variance  vt  is  then  replaced  by  a^v.  in  the  log-likelihood  of  Equation 
(9.H.17),  and  we  set  aj  =  1  in  Equation  (9.H.8).  Dropping  unneeded  constants,  we  get 
the  new  log-likelihood 


^  , ,  .  2  ,  [yt-y(t\t-i)]2 

=  z  log(^f)  + 

r  =  i  L 

which  can  be  minimized  analytically  with  respect  to  o(2  .  We  obtain 

[ yt-y(t\t-l )]2] 


z 

t=  1 


aev, 


(9.H.22) 


(9.H.23) 


Substituting  this  back  into  Equation  (9.H.22),  we  now  find  that 


n  n 

t  =  ^  logvy  +  wlog  ^ 

t= 1  t =  1 


[yt-y(t\t- 1)]2 


(9.H.24) 


which  must  be  minimized  numerically  with  respect  to  (|>j,  {(>2, _ ,  (t>/;,  6j,  6t,...,  0?,  and 

ct2  .  Having  done  so,  we  return  to  Equation  (9.H.23)  to  estimate  a2 .  The  function 
defined  by  Equation  (9.H.24)  is  sometimes  called  the  concentrated  log-likelihood 
function. 
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Seasonal  Models 


In  Chapter  3,  we  saw  how  seasonal  deterministic  trends  might  be  modeled.  However,  in 
many  areas  in  which  time  series  are  used,  particularly  business  and  economics,  the 
assumption  of  any  deterministic  trend  is  quite  suspect  even  though  cyclical  tendencies 
are  very  common  in  such  series. 

Here  is  an  example:  Levels  of  carbon  dioxide  (CO2)  are  monitored  at  several  sites 
around  the  world  to  investigate  atmospheric  changes.  One  of  the  sites  is  at  Alert,  North¬ 
west  Territories,  Canada,  near  the  Arctic  Circle. 


7 

' 


A  / 


\J 


Exhibit  10.1  displays  the  monthly  C09  levels  from  January  1994  through  Decem¬ 
ber  2004.  There  is  a  strong  upward  trend  but  also  a  seasonality  that  can  be  seen  better  in 
the  more  detailed  Exhibit  10.2,  where  only  the  last  few  years  are  graphed  using  monthly 
plotting  symbols. 


Exhibit  10.1  Monthly  Carbon  Dioxide  Levels  at  Alert,  NWT,  Canada 


>  data(co2) 

>  win . graph (width=4 . 875 , height =3 , pointsize=8 ) 

>  plot (co2 ,ylab= 1 C02 1 ) 
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As  we  see  in  the  displays,  carbon  dioxide  levels  are  higher  during  the  winter 
months  and  much  lower  in  the  summer.  Deterministic  seasonal  models  such  as  seasonal 
means  plus  linear  time  trend  or  sums  of  cosine  curves  at  various  frequencies  plus  linear 
time  trend  as  we  investigated  in  Chapter  3  could  certainly  be  considered  here.  But  we 
discover  that  such  models  do  not  explain  the  behavior  of  this  time  series.  For  this  series 
and  many  others,  it  can  be  shown  that  the  residuals  from  a  seasonal  means  plus  linear 
time  trend  model  are  highly  autocorrelated  at  many  lags.  '  In  contrast,  we  will  see  that 
the  stochastic  seasonal  models  developed  in  this  chapter  do  work  well  for  this  series. 


Exhibit  10.2  Carbon  Dioxide  Levels  with  Monthly  Symbols 


Time 

>  plot (window ( co2 , start=c (2000 , 1 ) ) , ylab= 1 C02 1 ) 

>  Month=c ( ' J ' , ' F ' , ' M ' , 'A', 'M1, 'J1, ' J ' , 'A1, 'S', 'O', 'N', ' D ' ) 

>  points (window(co2, start=c (2000, 1) ) ,pch=Month) 
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We  begin  by  studying  stationary  models  and  then  consider  nonstationary  generalizations 
in  Section  10.3.  We  let  .v  denote  the  known  seasonal  period;  for  monthly  series  s  =  12 
and  for  quarterly  series  s  =  4. 

Consider  the  time  series  generated  according  to 


Notice  that 

C°v(Yt,Yt  l)  =  Cov(et-@et_12,  ef_1  -©ef_13) 
=  0 


but  that 


'  We  ask  you  to  verify  this  in  Exercise  10.8. 
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Cov(Yt,Yt_n)  =  Cov{et-®et_n,et_n~®et_24) 

=  -©<t2 

It  is  easy  to  see  that  such  a  series  is  stationary  and  has  nonzero  autocorrelations  only  at 
lag  12. 

Generalizing  these  ideas,  we  define  a  seasonal  MA(g)  model  of  order  Q  with  sea¬ 
sonal  period  s  by 

Yt  =  et-®\et-S-®2et-2S - &Qet-Qs  (10.1.1) 

with  seasonal  MA  characteristic  polynomial 

0(x)  =  l-0j^-02x2i - ®QxQs  (10.1.2) 

It  is  evident  that  such  a  series  is  always  stationary  and  that  the  autocorrelation  function 
will  be  nonzero  only  at  the  seasonal  lags  of  s ,  2s,  3s,...,  Qs.  In  particular. 


Q^  +  010^+1  +&2&k+2  +  •"  +&Q^k&Q 
1  +©2  +  02+  ...  +©2 


for  k  =  1,2,  ...,  Q  (10.1.3) 


(Compare  this  with  Equation  (4.2.5)  on  page  65  for  the  nonseasonal  MA  process.)  For 
the  model  to  be  invertible,  the  roots  of  0(.r)  =  0  must  all  exceed  1  in  absolute  value. 

It  is  useful  to  note  that  the  seasonal  MA(g)  model  can  also  be  viewed  as  a  special 
case  of  a  nonseasonal  MA  model  of  order  q  =  Qs  but  with  all  0-values  zero  except  at  the 
seasonal  lags  s,  2s,  3s,...,  Qs. 

Seasonal  autoregressive  models  can  also  be  defined.  Consider 


Y,  =  ®Y,-12 


+  et 


(10.1.4) 


where  |0|  <  1  and  et  is  independent  of  Yt_  j,  Yt_  2,...  .  It  can  be  shown  that  |0|  <  1 
ensures  stationarity.  Thus  it  is  easy  to  argue  that  E(Yt)  =  0;  multiplying  Equation 
(10.1.4)  by  Yt_k,  taking  expectations,  and  dividing  by  y0  yields 

Pk  =  ®Pt_i2  for^  1  (10.1.5) 


Clearly 

Pl2  =  ®P0  =  ®  and  P24  =  °Pi2  =  °2 

More  generally, 

pm  =  Ok  for  k  =  1,2,  ...  (10.1.6) 


Furthermore,  setting  k  =  1  and  then  k  =  1 1  in  Equation  (10.1.5)  and  using  =  p_^  gives 
us 

Pi  =  ®Pn  and  Pn  =  ®Pi 


which  implies  that  Pi  =  Pn  =  0.  Similarly,  one  can  show  that  p^.  =  0  except  at  the  sea¬ 
sonal  lags  12,  24,  36,...  .  At  those  lags,  the  autocorrelation  function  decays  exponen¬ 
tially  like  an  AR(1)  model. 


230 


Seasonal  Models 


With  this  example  in  mind,  we  define  a  seasonal  AR(P)  model  of  order  P  and 
seasonal  period  s  by 

Y,  =  ®lY,-S  +  ®2Y,-2  S+-+®PYt-Ps  +  et  (10.1.7) 

with  seasonal  characteristic  polynomial 

O(.r)  =  1  -ch,x'5-(h2x2'5 - ®  pxPs  (10.1.8) 

As  always,  we  require  et  to  be  independent  of  Yt_  j,  Yt_ 2,...,  and,  for  stationarity,  that 
the  roots  of  <t>(.r)  =  0  be  greater  than  1  in  absolute  value.  Again,  Equation  (10.1.7)  can  be 
seen  as  a  special  AR(p)  model  of  order  p  =  Ps  with  nonzero  ^-coefficients  only  at  the 
seasonal  lags  s,  2s,  3s,...,  Ps. 

It  can  be  shown  that  the  autocorrelation  function  is  nonzero  only  at  lags  s,  2s,  3s, 
...,  where  it  behaves  like  a  combination  of  decaying  exponentials  and  damped  sine  func¬ 
tions.  In  particular,  Equations  (10.1.4),  (10.1.5),  and  (10.1.6)  easily  generalize  to  the 
general  seasonal  AR(1)  model  to  give 

p*,  =  fork  =  1,2,  ...  (10.1.9) 

with  zero  correlation  at  other  lags. 

10.2  Multiplicative  Seasonal  ARMA  Models 


Rarely  shall  we  need  models  that  incorporate  autocorrelation  only  at  the  seasonal  lags. 
By  combining  the  ideas  of  seasonal  and  nonseasonal  ARMA  models,  we  can  develop 
parsimonious  models  that  contain  autocorrelation  for  the  seasonal  lags  but  also  for  low 
lags  of  neighboring  series  values. 

Consider  a  model  whose  MA  characteristic  polynomial  is  given  by 

(1  -0jr)(l  -0jc12) 

Multiplying  out,  we  have  1  -  Qx  -  0x12  +  90.v:13  .  Thus  the  corresponding  time  series 
satisfies 


Yt  =  et  —  Qe( _  j  - &ef_  12  +  9®et  13  (10.2.1) 

For  this  model,  we  can  check  that  the  autocorrelation  function  is  nonzero  only  at  lags  1, 
11,  12,  and  13.  We  find 

y0  =  (l  +  02)(l+02)a2  (10.2.2) 

p,  .  -  Jp  (10.2.3, 


Pit  ~  Pl3  “ 


90 

(1  +02)(1  +©2) 


(10.2.4) 


and 
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P 12  =  --^-2  (10.2.5) 

1  +  ©z 

Exhibit  10.3  displays  the  autocorrelation  functions  for  the  model  of  Equation  (10.2.1) 
with  0  =  ±0.5  and  ©  =  -0.8  as  given  by  Equations  (10.2.2)— (10.2.5). 


Exhibit  10.3  Autocorrelations  from  Equations  (10.2.2)-(1 0.2.5) 
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0  =  -0.5,  ©  : 


-0.8 


I  I  I  I  I  I  I  I  I  I  I  I  I 
1  3  5  7  9  11  13 


Lag  k  Lag  k 

Of  course,  we  could  also  introduce  both  short-term  and  seasonal  autocorrelations 
by  defining  an  MA  model  of  order  12  with  only  9]  and  012  nonzero.  We  shall  see  in  the 
next  section  that  the  “multiplicative”  model  arises  quite  naturally  for  nonstationary 
models  that  entail  differencing. 

In  general,  then,  we  define  a  multiplicative  seasonal  A R \I A(p,q)x(P,Q)s  model 
with  seasonal  period  s  as  a  model  with  AR  characteristic  polynomial  <t)(x)<b(x)  and  MA 
characteristic  polynomial  9(x)0(x),  where 


and 


4>(.r)  =  1  -  4»  j^jc  -  4>2jt  -  •  •• 

-  <t 'pXP 

®(jc)  =  \-<&xxs-Q>2x2s- 

- <t>p.r 

0(x)  =  1  -  9jX -  02x2 -  ••• 

-  v? 

@(x)  =  l  - &xxs -&2X2s  - 

1 

© 

to 

* 

(10.2.6) 


(10.2.7) 


The  model  may  also  contain  a  constant  term  Oq.  Note  once  more  that  we  have  just  a  spe¬ 
cial  ARMA  model  with  AR  order  p  +  Ps  and  MA  order  q  +  Qs,  but  the  coefficients  are 
not  completely  general,  being  determined  by  only  p  +  P  +  q  +  Q  coefficients.  If  s  =  12, 
p  +  P  +  q  +  Q  will  be  considerably  smaller  than  p  +  Ps  +  q  +  Qs  and  will  allow  a  much 
more  parsimonious  model. 

As  another  example,  suppose  P  =  q=  1  and  p  =  Q  =  0  with  s  =  12.  The  model  is  then 


Yr  =  ®Yt-n  +  et-Qet-i 


(10.2.8) 
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Using  our  standard  techniques,  we  find  that 

Yj=®yn-eo2  (10.2.9) 

and 

Yk  =  ®Vk-l2  f°rk>2  (10.2.10) 

After  considering  the  equations  implied  by  various  choices  for  k,  we  arrive  at 


Pp^  =  d>*  for  k>  1 


Pm- 1  -  P m+  i 


(10.2.11) 


with  autocorrelations  for  all  other  lags  equal  to  zero. 

Exhibit  10.4  displays  the  autocorrelation  functions  for  two  of  these  seasonal 
ARIMA  processes  with  period  12:  one  with  <b  =  0.75  and  0  =  0.4,  the  other  with  <b  = 
0.75  and  0  =  -0.4.  The  shape  of  these  autocorrelations  is  somewhat  typical  of  the  sam¬ 
ple  autocorrelation  functions  for  numerous  seasonal  time  series.  The  even  simpler  auto¬ 
correlation  function  given  by  Equations  (10.2.3),  (10.2.4),  and  (10.2.5)  and  displayed  in 
Exhibit  10.3  also  seems  to  occur  frequently  in  practice  (perhaps  after  differencing). 


Lag  k 


Lag  k 
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10.3  Nonstationary  Seasonal  ARIMA  Models 


An  important  tool  in  modeling  nonstationary  seasonal  processes  is  the  seasonal  differ¬ 
ence.  The  seasonal  difference  of  period  s  for  the  series  {Yt}  is  denoted  V  sYt  and  is 
defined  as 

=  Yt-Yt_s  (10.3.1) 

For  example,  for  monthly  series  we  consider  the  changes  from  January  to  January,  Feb¬ 
ruary  to  February,  and  so  forth  for  successive  years.  Note  that  for  a  series  of  length  n, 
the  seasonal  difference  series  will  be  of  length  n  -  s;  that  is,  5  data  values  are  lost  due  to 
seasonal  differencing. 

As  an  example  where  seasonal  differencing  is  appropriate,  consider  a  process  gen¬ 
erated  according  to 

Yt  =  St  +  et  (10.3.2) 

with 

S,  =  St_s  +  st  (10.3.3) 


where  {et}  and  { 8, }  are  independent  white  noise  series.  Here  { S, }  is  a  “seasonal  random 
walk,”  and  if  a..  «  ae  ,  { St }  would  model  a  slowly  changing  seasonal  component. 

Due  to  the  nonstationarity  of  {St},  clearly  {  Y, }  is  nonstationary.  However,  if  we  sea¬ 
sonally  difference  { Yt],  as  given  in  Equation  (10.3.1),  we  find 


VJt  =  St~St-s*et-et_s 


£t  +  et~et-s 


(10.3.4) 


An  easy  calculation  shows  that  V  v  Y t  is  stationary  and  has  the  autocorrelation  function  of 
an  MA(1)S  model. 

The  model  described  by  Equations  (10.3.2)  and  (10.3.3)  could  also  be  generalized 
to  account  for  a  nonseasonal,  slowly  changing  stochastic  trend.  Consider 


with 


and 


-  Mt  +  St  +  et 

(10.3.5) 

'  =  St-s  +  St 

(10.3.6) 

=  +  ^ 

(10.3.7) 

where  {et},  {sf},  and  {^f}  are  mutually  independent  white  noise  series.  Here  we  take 
both  a  seasonal  difference  and  an  ordinary  nonseasonal  difference  to  obtain ' 


*  It  should  be  noted  that  VsYt  will  in  fact  be  stationary  and  WsYt  will  be  noninvertible.  We 
use  Equations  (10.3.5),  (10.3.6),  and  (10.3.7)  merely  to  help  motivate  multiplicative  sea¬ 
sonal  ARIMA  models. 


234 


Seasonal  Models 


VV  7  =  V (M  -  M  +E  +e  -e  ) 

(10.3.8) 

=  (5,  +  et  +  et)  -  (st_  j  +  i)  -  (5f_,  +  i 

The  process  defined  here  is  stationary  and  has  nonzero  autocorrelation  only  at  lags  1 , 
s  -  I .  .v,  and  s  +  1 ,  which  agrees  with  the  autocorrelation  structure  of  the  multiplicative 
seasonal  model  ARMA(0,l)x(0,l)  with  seasonal  period  s. 

These  examples  lead  to  the  definition  of  nonstationary  seasonal  models.  A  process 
{Yt}  is  said  to  be  a  multiplicative  seasonal  ARIMA  model  with  nonseasonal  (regular) 
orders  p,  d,  and  q,  seasonal  orders  P,  D,  and  Q ,  and  seasonal  period  s  if  the  differenced 
series 

Wt  =  VrfVfT,  (10.3.9) 

satisfies  an  ARMA(jj,q)x(P,Q)s  model  with  seasonal  period  s. '  We  say  that  {  Y, }  is  an 
ARlMA(p,d,q)x(P,D,Q)s  model  with  seasonal  period  .v. 

Clearly,  such  models  represent  a  broad,  flexible  class  from  which  to  select  an 
appropriate  model  for  a  particular  time  series.  It  has  been  found  empirically  that  many 
series  can  be  adequately  fit  by  these  models,  usually  with  a  small  number  of  parameters, 
say  three  or  four. 

10.4  Model  Specification,  Fitting,  and  Checking 


Model  specification,  fitting,  and  diagnostic  checking  for  seasonal  models  follow  the 
same  general  techniques  developed  in  Chapters  6,  7,  and  8.  Here  we  shall  simply  high¬ 
light  the  application  of  these  ideas  specifically  to  seasonal  models  and  pay  special  atten¬ 
tion  to  the  seasonal  lags. 

Model  Specification 

As  always,  a  careful  inspection  of  the  time  series  plot  is  the  first  step.  Exhibit  10.1  on 
page  227  displays  monthly  carbon  dioxide  levels  in  northern  Canada.  The  upward  trend 
alone  would  lead  us  to  specify  a  nonstationary  model.  Exhibit  10.5  shows  the  sample 
autocorrelation  function  for  that  series.  The  seasonal  autocorrelation  relationships  are 
shown  quite  prominently  in  this  display.  Notice  the  strong  correlation  at  lags  12,  24,  36, 
and  so  on.  In  addition,  there  is  substantial  other  correlation  that  needs  to  be  modeled. 


Using  the  backshift  operator  notation  of  Appendix  D,  page  106,  we  may  write  the  general 
ARIMAfp,  d , q) x (P, D,  Q)s  model  as  §(B)(\XB)VdV snYt  =  Q(B)Q(B)er 
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Exhibit  10.5  Sample  ACF  of  C02  Levels 


Exhibit  10.6  shows  the  time  series  plot  of  the  COi  levels  after  we  take  a  first  differ¬ 
ence. 


Exhibit  10.6  Time  Series  Plot  of  the  First  Differences  of  C09  Levels 
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Exhibit  10.7  Sample  ACF  of  First  Differences  of  C02  Levels 
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>  acf (as . vector (diff (co2 )), lag . max=36 ) 

Exhibit  10.8  displays  the  time  series  plot  of  the  CO2  levels  after  taking  both  a  first 
difference  and  a  seasonal  difference.  It  appears  that  most,  if  not  all,  of  the  seasonality  is 
gone  now. 
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V12V7,  =  et~  t-  1  +  0®ef-13  (10.4.10) 

which  incorporates  many  of  these  requirements.  As  usual,  all  models  are  tentative  and 
subject  to  revision  at  the  diagnostics  stage  of  model  building. 


Exhibit  10.9  Sample  ACF  of  First  and  Seasonal  Differences  of  C02 


5  10  15  20  25  30  35 


Lag 


>  acf (as . vector (diff (diff (co2 )  , lag=12 ) ) , lag . max=3 6 , ci . type= ' ma ' ) 


Model  Fitting 

Having  specified  a  tentative  seasonal  model  for  a  particular  time  series,  we  proceed  to 
estimate  the  parameters  of  that  model  as  efficiently  as  possible.  As  we  have  remarked 
earlier,  multiplicative  seasonal  ARIMA  models  are  just  special  cases  of  our  general 
ARIMA  models.  As  such,  all  of  our  work  on  parameter  estimation  in  Chapter  7  carries 
over  to  the  seasonal  case. 

Exhibit  10.10  gives  the  maximum  likelihood  estimates  and  their  standard  errors  for 
the  ARIMA(0,1,1)x(0,1,1)12  model  for  C02  levels. 


Exhibit  10.10  Parameter  Estimates  for  the  C02  Model 

Coefficient  0  © 

Estimate  0.5792  0.8206 

Standard  error  0.0791  0.1137 

al  =  0.5446:  log-likelihood  =  -139.54,  AIC  =  283.08 

>  ml . co2=arima ( co2 , order=c (0,1,1) , seasonal=list ( order =c (0,1,1) , 

period=12 ) ) 

>  ml . co2 
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The  coefficient  estimates  are  all  highly  significant,  and  we  proceed  to  check  further  on 
this  model. 

Diagnostic  Checking 

To  check  the  estimated  the  ARIMA(0,l,l)x(0,l,l)i2  model,  we  first  look  at  the  time 
series  plot  of  the  residuals.  Exhibit  10.11  gives  this  plot  for  standardized  residuals. 
Other  than  some  strange  behavior  in  the  middle  of  the  series,  this  plot  does  not  suggest 
any  major  irregularities  with  the  model,  although  we  may  need  to  investigate  the  model 
further  for  outliers,  as  the  standardized  residual  at  September  1998  looks  suspicious.  We 
investigate  this  further  in  Chapter  1 1 . 


Exhibit  10.11  Residuals  from  the  ARIMA(0,1,1)x(0,1,1)12  Model 


>  plot (window ( rstandard (ml . co2 ) ,start=c(1995,2) ) , 

ylab= ' Standardized  Residuals  1 , type= 'o') 

>  abline (h=0) 


To  look  further,  we  graph  the  sample  ACF  of  the  residuals  in  Exhibit  10.12.  The 
only  “statistically  significant”  correlation  is  at  lag  22,  and  this  correlation  has  a  value  of 
only  -0.17,  a  very  small  correlation.  Furthermore,  we  can  think  of  no  reasonable  inter¬ 
pretation  for  dependence  at  lag  22.  Finally,  we  should  not  be  surprised  that  one  autocor¬ 
relation  out  of  the  36  displayed  is  statistically  significant.  This  could  easily  happen  by 
chance  alone.  Except  for  marginal  significance  at  lag  22,  the  model  seems  to  have  cap¬ 
tured  the  essence  of  the  dependence  in  the  series. 
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Exhibit  10.12  ACF  of  Residuals  from  the  ARIMA(0,1,1)x(0,1,1)12  Model 


5  10  15  20  25  30  35 


Lag 

>  acf (as .vector (window (r standard (ml . co2 )  , start  =  c (1995,2) ) )  , 
lag.max=36) 


The  Ljung-Box  test  for  this  model  gives  a  chi-squared  value  of  25.59  with  22 
degrees  of  freedom,  leading  to  a  /;- value  of  0.27 — a  further  indication  that  the  model 
has  captured  the  dependence  in  the  time  series. 

Next  we  investigate  the  question  of  normality  of  the  error  terms  via  the  residuals. 
Exhibit  10.13  displays  the  histogram  of  the  residuals.  The  shape  is  somewhat 
“bell-shaped”  but  certainly  not  ideal.  Perhaps  a  quantile-quantile  plot  will  tell  us  more. 


Exhibit  10.13  Residuals  from  the  ARIMA(0,1,1)x(0,1,1)12  Model 


-3-2-10  1  2  3  4 

Standardized  Residuals 


>  win . graph (width=3 ,  height=3 , pointsize=8 ) 

>  hist (window (rstandard (ml . co2 ) , start=c (19  95,2) )  , 

xlab= ' Standardized  Residuals') 
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Exhibit  10.14  displays  the  QQ-normal  plot  for  the  residuals. 


Exhibit  10.14  Residuals:  ARIMA(0,1,1)x(0,1,1)12  Model 


Theoretical  Quantiles 

>  win . graph (width=2 . 5 , height=2 . 5 , pointsize=8 ) 

>  qqnorm (window ( r standard (ml . co2 ) , start=c (1995,2) ) ) 

>  qqline (window ( r standard (ml . co2 ) , start  =  c (1995,2)  )  ) 


Here  we  again  see  the  one  outlier  in  the  upper  tail,  but  the  Shapiro- Wilk  test  of  nor¬ 
mality  has  a  test  statistic  of  W=  0.982,  leading  to  a  /> value  of  0. 1 1,  and  normality  is  not 
rejected  at  any  of  the  usual  significance  levels. 

As  one  further  check  on  the  model,  we  consider  overfitting  with  an  ARIMA(0,1,2) 
x(0, l,l)j2  model  with  the  results  shown  in  Exhibit  10.15. 


Exhibit  10.15  ARIMA(0,1,2)x(0,1,1)12  Overfitted  Model 

Coefficient  0!  02  0 

Estimate  0.5714  0.0165  0.8274 

Standard  error  0.0897  0.0948  0.1224 

a}  =  0.5427:  log-likelihood  =  -139.52,  AIC  =  285.05 

>  m2 . co2=arima (co2 , order =c (0,1,2) , seasonal  =  list (order=c (0,1,1)  , 

period=12 ) ) 

>  m2 . co2 


When  we  compare  these  results  with  those  reported  in  Exhibit  10.10  on  page  237, 
we  see  that  the  estimates  of  0|  and  ©  have  changed  very  little — especially  when  the  size 
of  the  standard  errors  is  taken  into  consideration.  In  addition,  the  estimate  of  the  new 
parameter,  02,  is  not  statistically  different  from  zero.  Note  also  that  the  estimate  6,?  and 
the  log-likelihood  have  not  changed  much  while  the  AIC  has  actually  increased. 

The  ARIMA(0,l,l)x(0,l,l)p  model  was  popularized  in  the  first  edition  of  the  sem¬ 
inal  book  of  Box  and  Jenkins  (1976)  when  it  was  found  to  characterize  the  logarithms  of 
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a  monthly  airline  passenger  time  series.  This  model  has  come  to  be  known  as  the  airline 
model.  We  ask  you  to  analyze  the  original  airline  data  in  the  exercises. 

10.5  Forecasting  Seasonal  Models 


Computing  forecasts  with  seasonal  ARIMA  models  is,  as  expected,  most  easily  carried 
out  recursively  using  the  difference  equation  form  for  the  model,  as  in  Equations 
(9.3.28),  (9.3.29)  on  page  199  and  (9.3.40)  on  page  201.  For  example,  consider  the 
model  ARIM A(0, 1 , 1 )  x  ( 1 ,0, 1 )  12. 

Yt-Yt- i  =  12  —  Yt_  i3)  +  et~  ®et- 1  -  ®et- 12  +  13  (10.5.1) 

which  we  rewrite  as 

Y,=  Yt^+®Yt_n-®Yt_n  +  et-Qet_l-@et_n  +  e®et_l3  (10.5.2) 
The  one-step-ahead  forecast  from  origin  t  is  then 

Yt(  1)  =  Yt  +  <bYt_n-Q>Yt_n-Qet-®et_n  +  Q®et_n  (10.5.3) 

and  the  next  one  is 

Y,( 2)  =  ^,(1)  +  OF, _  10  -  ® I",- n  - 10,^  00cf _  n  (10.5.4) 

and  so  forth.  The  noise  terms  et_  13,  et_  12,  et_  n,...,  et  (as  residuals)  will  enter  into  the 
forecasts  for  lead  times  1=  1,2,...,  13,  but  for  ^>13  the  autoregressive  part  of  the  model 
takes  over  and  we  have 

Yt(f)  =  Yf(t—  1)  +  <f?Yt((-  12)  -<S>Yt(f-  13)  for  <?>  13  (10.5.5) 

To  understand  the  general  nature  of  the  forecasts,  we  consider  several  special  cases. 

Seasonal  AR(1)i2 

The  seasonal  AR(1)12  model  is 

Yt  =  <£>Yt_n  +  et  (10.5.6) 

Clearly,  we  have 

Yt(()  =  O  Yt(t-  12)  (10.5.7) 

However,  iterating  back  on  l,  we  can  also  write 

Yf(t)  =  ®k+1Yt  +  r_n  (10.5.8) 

where  k  and  r  are  defined  by  1=  12 k  +  r  +  1  with  0  <  r  <  12  and  k  =  0,  1,2,...  .In  other 
words,  k  is  the  integer  part  of  (l-  1)/12  and  r/12  is  the  fractional  part  of  (l-  1)/12.  If  our 
last  observation  is  in  December,  then  the  next  January  value  is  forecast  as  cl>  times  the 
last  observed  January  value,  February  is  forecast  as  <F  times  the  last  observed  February 
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value,  and  so  on.  Two  Januarys  ahead  is  forecast  as  <J>  times  the  last  observed  January. 
Looking  just  at  January  values,  the  forecasts  into  the  future  will  decay  exponentially  at  a 
rate  determined  by  the  magnitude  of  ®.  All  of  the  forecasts  for  each  month  will  behave 
similarly  but  with  different  initial  forecasts  depending  on  the  particular  month  under 
consideration. 

Using  Equation  (9.3.38)  on  page  201  and  the  fact  that  the  \|/-weights  are  nonzero 
only  for  multiple  of  12,  namely, 


®  j,[2 

0 


for  j  =  0,  12,24,  ... 
otherwise 


(10.5.9) 


we  have  that  the  forecast  error  variance  can  be  written  as 


Vcir(et(e))  = 


1  _  ®2i  +  2- 


l-®2  J 

where,  as  before,  k  is  the  integer  part  of  (l-  1 )/ 1 2 . 


(10.5.10) 


Seasonal  MA(1)i2 

For  the  seasonal  MA(1)]2  model,  we  have 

Y t  =  ef-0e/-12  +  0O 

In  this  case,  we  see  that 

W  =  -  ®et-n  +  0o 

Yt(  2)  =  -®^-1O  +  0o 

Yt(  12)  =  -0ef  +  0o 

and 

Yt{l)  =  0Q  for  ^>12 


(10.5.11) 


(10.5.12) 


(10.5.13) 


Here  we  obtain  different  forecasts  for  the  months  of  the  first  year,  but  from  then  on  all 
forecasts  are  given  by  the  process  mean. 

For  this  model,  \|/0  =  1,  \|/12  =  -0,  and  \\ij  =  0  otherwise.  Thus,  from  Equation 
(9.3.38)  on  page  201, 


Var(et(C)) 


[  ct2  1  <  ^<  12 
1(1+  ©2)ct2  12  <t 


(10.5.14) 


ARIMA(0,0,0)x(0,1,1)12 

The  ARIMA(0,0,0)x(0,l,l)i2  model  is 


(10.5.15) 
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or 

Y  =  Y  +e  -  &e 

t+e  t+e-12  t+e  t  +  e-12 

so  that 

Yt(  1)  =  Yt_n-®et_u 
Yf( 2)  =  Yt_  io~®e/-io 

Y,(  12)  =  Yt-@et 

and  then 

7,(0  =  Yf(e-  12)  for  l>  12  (10.5.17) 

It  follows  that  all  Januarys  will  forecast  identically,  all  Februarys  identically,  and  so 
forth. 

If  we  invert  this  model,  we  find  that 

7,  =  ( 1  —  ©)(7f _  12  +  ©7f  24  +  027f  36  +  ■■■)  +  et 
Consequently,  we  can  write 

7,(1)  =  (l-®)fj@jYt_n_nj 
j  =  0 

7,(2)  =  (1  -  0)  £  ®jYt_w_nj 
j  =  0 

7,(12)  =  (l-0)f  0/7,_|2/ 
j  =  0 

From  this  representation,  we  see  that  the  forecast  for  each  January  is  an  exponentially 
weighted  moving  average  of  all  observed  Januarys,  and  similarly  for  each  of  the  other 
months. 

In  this  case,  we  have  y,  =  1  -  ©  for  j  =  12,  24,...,  and  zero  otherwise.  The  forecast 
error  variance  is  then 

Var(et(t !))  =  [  1  +  k(  1  -  ®)2}o2e  (10.5.19) 
where  k  is  the  integer  part  of  (i  -  1)/12. 


(10.5.18) 


(10.5.16) 


ARIMA(0,1,1)x(0,1,1)12 

For  the  ARIMA(0,1,1)x(0,1,1)12  model 


(10.5.20) 
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the  forecasts  satisfy 

£f(l)  =  yt  +  Yt-n  -Yt_n-Qet-©et_n  +  00ef_  12 

U 2)  =  fyl)  +  y(-t0  -^-n  -®ef-10  +0®ef_ll 

;  (10.5.21) 

yr(12)  =  y,(ll) +yr  _yf_x  -®ef  +  9®^ 

Yf(  13)  =  7,(12)  +  F,(1)  -Y,  +Q®et 

and 

Yt(e)  =  Yt((-  1)  +  Yt(t-  12)  -  Y(((-  13)  for  <?>  13  (10.5.22) 

To  understand  the  general  pattern  of  these  forecasts,  we  can  use  the  representation 

Yt(C)  =  Al  +AlJ+  ^cos^O  +B2j  sin(^]  (10.5.23) 

where  the  A’s  and  B's  are  dependent  on  Y,,  Yt_l,...,  or,  alternatively,  determined  from 
the  initial  forecasts  Y  ( 1 ) ,  Y  (2) Y  ( 13) .  This  result  follows  from  the  general  the- 

III  1  ry 

ory  of  difference  equations  and  involves  the  roots  of  (1  -  x)(\  -  x  )  =  0. 

Notice  that  Equation  ( 10.5.23)  reveals  that  the  forecasts  are  composed  of  a  linear 
trend  in  the  lead  time  plus  a  sum  of  periodic  components.  However,  the  coefficients  A;- 
and  By  are  more  dependent  on  recent  data  than  on  past  data  and  will  adapt  to  changes  in 
the  process  as  our  forecast  origin  changes  and  the  forecasts  are  updated.  This  is  in  stark 
contrast  to  forecasting  with  deterministic  time  trend  plus  seasonal  components,  where 
the  coefficients  depend  rather  equally  on  both  recent  and  past  data  and  remain  the  same 
for  all  future  forecasts. 

Prediction  Limits 

Prediction  limits  are  obtained  precisely  as  in  the  nonseasonal  case.  We  illustrate  this 
with  the  carbon  dioxide  time  series.  Exhibit  10.16  shows  the  forecasts  and  95%  forecast 
limits  for  a  lead  time  of  two  years  for  the  ARIMA(0,l,l)x(0,l,l)|2  model  that  we  fit. 
The  last  two  years  of  observed  data  are  also  shown.  The  forecasts  mimic  the  stochastic 
periodicity  in  the  data  quite  well,  and  the  forecast  limits  give  a  good  feeling  for  the  pre¬ 
cision  of  the  forecasts. 


10.5  Forecasting  Seasonal  Models 


245 


Exhibit  10.16  Forecasts  and  Forecast  Limits  for  the  C02  Model 


>  win . graph (width=4 . 875 , height =3 , point size =8 ) 

>  plot (ml . co2 ,nl  =  c(2003,l) , n . ahead=24 , xlab= ' Year ' , type= 'o', 

ylab='C02  Levels') 


Exhibit  10.17  displays  the  last  year  of  observed  data  and  forecasts  out  four  years. 
At  this  lead  time,  it  is  easy  to  see  that  the  forecast  limits  are  getting  wider,  as  there  is 
more  uncertainty  in  the  forecasts. 


Exhibit  10.17  Long-Term  Forecasts  for  the  C02  Model 


>  plot (ml . co2 ,nl  =  c(2004,l) , n. ahead=4  8 , xlab= ' Year ' , type= ' b ' , 
ylab= ' C02  Levels') 
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10.6  Summary 


Multiplicative  seasonal  ARIMA  models  provide  an  economical  way  to  model  time 
series  whose  seasonal  tendencies  are  not  as  regular  as  we  would  have  with  a  determinis¬ 
tic  seasonal  trend  model  which  we  covered  in  Chapter  3.  Fortunately,  these  models  are 
simply  special  ARIMA  models  so  that  no  new  theory  is  needed  to  investigate  their  prop¬ 
erties.  We  illustrated  the  special  nature  of  these  models  with  a  thorough  modeling  of  an 
actual  time  series. 


Exercises 


10.1  Based  on  quarterly  data,  a  seasonal  model  of  the  form 


_4  +  et~Ql( 


J2*t-2 


has  been  fit  to  a  certain  time  series. 

(a)  Find  the  first  four  \|/-weights  for  this  model. 

(b)  Suppose  that  0j  =  0.5,  9?  =  -0.25,  and  ae  =  1.  Find  forecasts  for  the  next  four 
quarters  if  data  for  the  last  four  quarters  are 


Quarter  I  II  III 

Series  25  20  25 

Residual  2  12 


IV 

40 

3 


(c)  Find  95%  prediction  intervals  for  the  forecasts  in  part  (b). 

10.2  An  AR  model  has  AR  characteristic  polynomial 

(1  -  \.6x  +  Q.1x2){\  -0.8x12) 

(a)  Is  the  model  stationary? 

(b)  Identify  the  model  as  a  certain  seasonal  ARIMA  model. 

10.3  Suppose  that  { Y, }  satisfies 

Yj  =  a  +  bt  +  S j  + 

where  S,  is  deterministic  and  periodic  with  period  s  and  {Xf}  is  a  seasonal 
ARIMA(p, 0, q) x  (P,  1 ,  Q)s  series.  What  is  the  model  for  Wt  =  Yt  -Yt  _  s? 

10.4  For  the  seasonal  model  Y.  =  d>  Y.  4  +  eg  -  0c(  |  with  |<D|  <  1,  find  y0  and  p/{. 

10.5  Identify  the  following  as  certain  multiplicative  seasonal  ARIMA  models: 

(a)  Yt  =  0.5Yt_1  +  Yt_4-0.5Yt_5  +  et-03et_l. 

(b)  Yf  =  Yf  l  +  Yt  Y2  ~  Yt-  13  +  0-5ef_  ,  -  0.5ef  12  +  0.25ef  13  . 

10.6  Verify  Equations  ( 1 0.2. 1 1 )  on  page  232. 
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10.7  Suppose  that  the  process  {Yt}  develops  according  to  Ff  =  Yf  4  +  et  with  Yt  =  e, 
for  r  =  1,2,  3,  and  4. 

(a)  Find  the  variance  function  for  {  Yt\. 

(b)  Find  the  autocorrelation  function  for  {Yt}. 

(c)  Identify  the  model  for  { Yt}  as  a  certain  seasonal  ARIMA  model. 

10.8  Consider  the  Alert,  Canada,  monthly  carbon  dioxide  time  series  shown  in  Exhibit 
10.1  on  page  227.  The  data  are  in  the  file  named  co2. 

(a)  Fit  a  deterministic  seasonal  means  plus  linear  time  trend  model  to  these  data. 
Are  any  of  the  regression  coefficients  “statistically  significant”? 

(b)  What  is  the  multiple  R-squared  for  this  model? 

(c)  Now  calculate  the  sample  autocorrelation  of  the  residuals  from  this  model. 
Interpret  the  results. 

10.9  The  monthly  airline  passenger  time  series,  first  investigated  in  Box  and  Jenkins 
(1976),  is  considered  a  classic  time  series.  The  data  are  in  the  file  named  airpass. 

(a)  Display  the  time  series  plots  of  both  the  original  series  and  the  logarithms  of 
the  series.  Argue  that  taking  logs  is  an  appropriate  transformation. 

(b)  Display  and  interpret  the  time  series  plots  of  the  first  difference  of  the  logged 
series. 

(c)  Display  and  interpret  the  time  series  plot  of  the  seasonal  difference  of  the  first 
difference  of  the  logged  series. 

(d)  Calculate  and  interpret  the  sample  ACF  of  the  seasonal  difference  of  the  first 
difference  of  the  logged  series. 

(e)  Fit  the  “airline  model”  (ARIMA(0,l,l)x(0,l,l)i2  )  to  the  logged  series. 

(f)  Investigate  diagnostics  for  this  model,  including  autocorrelation  and  normality 
of  the  residuals. 

(g)  Produce  forecasts  for  this  series  with  a  lead  time  of  two  years.  Be  sure  to 
include  forecast  limits. 

10.10  Exhibit  5.8  on  page  99  displayed  the  monthly  electricity  generated  in  the  United 
States.  We  argued  there  that  taking  logarithms  was  appropriate  for  modeling. 
Exhibit  5.10  on  page  100  showed  the  time  series  plot  of  the  first  differences  for 
this  series.  The  filename  is  electricity. 

(a)  Calculate  the  sample  ACF  of  the  first  difference  of  the  logged  series.  Is  the 
seasonality  visible  in  this  display? 

(b)  Plot  the  time  series  of  seasonal  difference  and  first  difference  of  the  logged 
series.  Does  a  stationary  model  seem  appropriate  now? 

(c)  Display  the  sample  ACF  of  the  series  after  a  seasonal  difference  and  a  first 
difference  have  been  taken  of  the  logged  series.  What  model(s)  might  you 
consider  for  the  electricity  series? 
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10.11  The  quarterly  earnings  per  share  for  1960-1980  of  the  U.S.  company  Johnson  & 

Johnson,  are  saved  in  the  file  named  JJ. 

(a)  Plot  the  time  series  and  also  the  logarithm  of  the  series.  Argue  that  we  should 
transform  by  logs  to  model  this  series. 

(b)  The  series  is  clearly  not  stationary.  Take  first  differences  and  plot  that  series. 
Does  stationarity  now  seem  reasonable? 

(c)  Calculate  and  graph  the  sample  ACF  of  the  first  differences.  Interpret  the 
results. 

(d)  Display  the  plot  of  seasonal  differences  and  the  first  differences.  Interpret  the 
plot.  Recall  that  for  quarterly  data,  a  season  is  of  length  4. 

(e)  Graph  and  interpret  the  sample  ACF  of  seasonal  differences  with  the  first  dif¬ 
ferences. 

(f)  Fit  the  model  ARIMA(0,l,l)x(0,l,l)4,  and  assess  the  significance  of  the  esti¬ 
mated  coefficients. 

(g)  Perform  all  of  the  diagnostic  tests  on  the  residuals. 

(h)  Calculate  and  plot  forecasts  for  the  next  two  years  of  the  series.  Be  sure  to 
include  forecast  limits. 

10.12  The  file  named  boardings  contains  monthly  data  on  the  number  of  people  who 

boarded  transit  vehicles  (mostly  light  rail  trains  and  city  buses)  in  the  Denver, 

Colorado,  region  for  August  2000  through  December  2005. 

(a)  Produce  the  time  series  plot  for  these  data.  Be  sure  to  use  plotting  symbols 
that  will  help  you  assess  seasonality.  Does  a  stationary  model  seem  reason¬ 
able? 

(b)  Calculate  and  plot  the  sample  ACF  for  this  series.  At  which  lags  do  you  have 
significant  autocorrelation? 

(c)  Fit  an  ARMA(0,3)x(l,0)]2  model  to  these  data.  Assess  the  significance  of  the 
estimated  coefficients. 

(d)  Overfit  with  an  ARMA(0,4)x( 1 ,0)  1 2  model.  Interpret  the  results. 


Chapter  1 1 


Time  Series  Regression  Models 


In  this  chapter,  we  introduce  several  useful  ideas  that  incorporate  external  information 
into  time  series  modeling.  We  start  with  models  that  include  the  effects  of  interventions 
on  time  series’  normal  behavior.  We  also  consider  models  that  assimilate  the  effects  of 
outliers — observations,  either  in  the  observed  series  or  in  the  error  terms,  that  are  highly 
unusual  relative  to  normal  behavior.  Lastly,  we  develop  methods  to  look  for  and  deal 
with  spurious  correlation — correlation  between  series  that  is  artificial  and  will  not  help 
model  or  understand  the  time  series  of  interest.  We  will  see  that  prewhitening  of  series 
helps  us  find  meaningful  relationships. 

11.1  Intervention  Analysis 


Exhibit  11.1  shows  the  time  plot  of  the  logarithms  of  monthly  airline  passenger-miles  in 
the  United  States  from  January  1996  through  May  2005.  The  time  series  is  highly  sea¬ 
sonal,  displaying  the  fact  that  air  traffic  is  generally  higher  during  the  summer  months 
and  the  December  holidays  and  lower  in  the  winter  months.  *  Also,  air  traffic  was 
increasing  somewhat  linearly  overall  until  it  had  a  sudden  drop  in  September  2001.  The 
sudden  drop  in  the  number  of  air  passengers  in  September  2001  and  several  months 
thereafter  was  triggered  by  the  terrorist  acts  on  September  11,  2001,  when  four  planes 
were  hijacked,  three  of  which  were  crashed  into  the  twin  towers  of  the  World  Trade 
Center  and  the  Pentagon  and  the  fourth  into  a  rural  field  in  Pennsylvania.  The  terrorist 
attacks  of  September  2001  deeply  depressed  air  traffic  around  that  period,  but  air  traffic 
gradually  regained  the  losses  as  time  went  on.  This  is  an  example  of  an  intervention  that 
results  in  a  change  in  the  trend  of  a  time  series. 

Intervention  analysis,  introduced  by  Box  and  Tiao  (1975),  provides  a  framework 
for  assessing  the  effect  of  an  intervention  on  a  time  series  under  study.  It  is  assumed  that 
the  intervention  affects  the  process  by  changing  the  mean  function  or  trend  of  a  time 
series.  Interventions  can  be  natural  or  man-made.  For  example,  some  animal  population 
levels  crashed  to  a  very  low  level  in  a  particular  year  because  of  extreme  climate  in  that 
year.  The  postcrash  annual  population  level  may  then  be  expected  to  be  different  from 
that  in  the  precrash  period.  Another  example  is  the  increase  of  the  speed  limit  from  65 
miles  per  hour  to  70  miles  per  hour  on  an  interstate  highway.  This  may  make  driving  on 


'  In  the  exercises,  we  ask  you  to  display  the  time  series  plot  using  seasonal  plotting  symbols 
on  a  full-screen  graph,  where  the  seasonality  is  quite  easy  to  see. 


249 


250 


Time  Series  Regression  Models 


the  highway  more  dangerous.  On  the  other  hand,  drivers  may  stay  on  the  highway  for  a 
shorter  length  of  time  because  of  the  faster  speed,  so  the  net  effect  of  the  increased 
speed  limit  change  is  unclear.  The  effect  of  the  increase  in  speed  limit  may  be  studied  by 
analyzing  the  mean  function  of  some  accident  time  series  data;  for  example,  the  quar¬ 
terly  number  of  fatal  car  accidents  on  some  segment  of  an  interstate  highway.  (Note  that 
the  autocovariance  function  of  the  time  series  might  also  be  changed  by  the  intervention, 
but  this  possibility  will  not  be  pursued  here.) 


Year 

>  win . graph (width=4 . 875 , height =2 . 5 , pointsize=8 ) 

>  data (airmiles) 

>  plot ( log (airmiles ) , ylab= 1  Log (airmiles)  ' , xlab= 1  Year 1 ) 


We  first  consider  the  simple  case  of  a  single  intervention.  The  general  model  for  the 
time  series  {Yt},  perhaps  after  suitable  transformation,  is  given  by 

Yt  =  mt  +  Nt  (11.1.1) 

where  mt  is  the  change  in  the  mean  function  and  Nt  is  modeled  as  some  ARIMA  pro¬ 
cess,  possibly  seasonal.  The  process  { Nt }  represents  the  underlying  time  series  were 
there  no  intervention.  It  is  referred  to  as  the  natural  or  unperturbed  process,  and  it  may 
be  stationary  or  nonstationary,  seasonal  or  nonseasonal.  Suppose  the  time  series  is  sub¬ 
ject  to  an  intervention  that  takes  place  at  time  T.  Before  71  mr  is  assumed  to  be  identi¬ 
cally  zero.  The  time  series  { Yt,  t  <  T\  is  referred  to  as  the  preintervention  data  and  can 
be  used  to  specify  the  model  for  the  unperturbed  process  Nt. 

Based  on  subject  matter  considerations,  the  effect  of  the  intervention  on  the  mean 
function  can  often  be  specified  up  to  some  parameters.  A  useful  function  in  this  specifi¬ 
cation  is  the  step  function 
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that  is  0  during  the  preintervention  period  and  1  throughout  the  postintervention  period. 

The  pulse  function 


,(D 


e(D  J.T) 
V  ^t- 1 
JT) 


(11.1.3) 


equals  1  at  t  =  T  and  0  otherwise.  That  is,  Pt  is  the  indicator  or  dummy  variable  flag¬ 
ging  the  time  that  the  intervention  takes  place.  If  the  intervention  results  in  an  immedi¬ 
ate  and  permanent  shift  in  the  mean  function,  the  shift  can  be  modeled  as 


JT) 

m,  =  <»St 


(11.1.4) 


where  co  is  the  unknown  permanent  change  in  the  mean  due  to  the  intervention.  Testing 
whether  co  =  0  or  not  is  similar  to  testing  whether  the  population  means  are  the  same 
with  data  in  the  form  of  two  independent  random  samples  from  the  two  populations. 
However,  the  major  difference  here  is  that  the  pre-  and  postintervention  data  cannot  gen¬ 
erally  be  assumed  to  be  independent  and  identically  distributed.  The  inherent  serial  cor¬ 
relation  in  the  data  makes  the  problem  more  interesting  but  at  the  same  time  more 
difficult.  If  there  is  a  delay  of  d  time  units  before  the  intervention  takes  effect  and  d  is 
known,  then  we  can  specify 

mt  =  co  S^d  (11.1.5) 


In  practice,  the  intervention  may  affect  the  mean  function  gradually,  with  its  full  force 
reflected  only  in  the  long  run.  This  can  be  modeled  by  specifying  m,  as  an  AR(l)-type 
model  with  the  error  term  replaced  by  a  multiple  of  the  lag  1  of  : 

mf  =  8mf  _,£  +  coS^j  (11.1.6) 

with  the  initial  condition  m0  =  0.  After  some  algebra,  it  can  be  shown  that 


m 


t 


1-5  r  ..  ^ 
co — - — — ,  for  t  >  T 
1-5 

0,  otherwise 


(11.1.7) 


Often  5  is  selected  in  the  range  1  >  5  >  0.  In  that  case,  mt  approaches  co/(  1  -  5)  for 
large  f,  which  is  the  ultimate  change  (gain  or  loss)  for  the  mean  function.  Half  of  the 
ultimate  change  is  attained  when  1  -  5f_r=  0.5;  that  is,  when  t  =  T  +  log(0.5)/log(5). 
The  duration  log(0.5)/log(5)  is  called  the  half-life  of  the  intervention  effect,  and  the 
shorter  it  is,  the  quicker  the  ultimate  change  is  felt  by  the  system.  Exhibit  11.2  displays 
the  half-life  as  a  function  of  5,  which  shows  that  the  half-life  increases  with  5.  Indeed, 
the  half-life  becomes  infinitely  large  when  5  approaches  1 . 


Exhibit  11.2  Half-life  based  on  an  AR(1)  Process  with  Step  Function  Input 

5  0.2  0.4  0.6  0.8  0.9  1 

Half-life  0.43  0.76  1.46  3.11  6.58  oo 
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It  is  interesting  to  note  the  limiting  case  when  5=1.  Then  mt  =  to(7’-  t)  for  t  >  T  and 
0  otherwise.  The  time  sequence  plot  of  m,  displays  the  shape  of  a  ramp  with  slope  to. 
This  specification  implies  that  the  intervention  changes  the  mean  function  linearly  in  the 
postintervention  period.  This  ramp  effect  (with  a  one  time  unit  delay)  is  shown  in 
Exhibit  11.3  (c). 

Short-lived  intervention  effects  may  be  specified  using  the  pulse  dummy  variable 


f  1,  if  t  =  T 
[0,  otherwise 


(11.1.8) 


For  example,  if  the  intervention  impacts  the  mean  function  only  at  t=T,  then 

mt  =  (oP\T)  (11.1.9) 

Intervention  effects  that  die  out  gradually  may  be  specified  via  the  AR(l)-type  specifi¬ 
cation 

mt  =  dmt_l*(oP(tT)  (11.1.10) 

That  is,  mt  =  coo  T  1  for  t>T  so  that  the  mean  changes  immediately  by  an  amount  to  and 
subsequently  the  change  in  the  mean  decreases  geometrically  by  the  common  factor  of 
5;  see  Exhibit  1 1.4  (a).  Delayed  changes  can  be  incorporated  by  lagging  the  pulse  func¬ 
tion.  For  example,  if  the  change  in  the  mean  takes  place  after  a  delay  of  one  time  unit 
and  the  effect  dies  out  gradually,  we  can  specify 

mt  =  8mt_l  +(aP{tT_\  (11.1.11) 


Again,  we  assume  the  initial  condition  mQ  =  0. 

It  is  useful  to  write '  the  preceding  model  in  terms  of  the  backshift  operator  B, 

where  Bmt=  mt_  j  and  BP{J]  =  P^J\  ■  Then  ( 1  -  5 B)mf  =  co B ?  l .  Or,  we  can  write 


m 


t 


fit  B  (T) 
1-5  B  1 


Recall  ( 1  -  B)S^  =  P ,  which  can  be  rewritten  as  s[T) 


(11.1.12) 


1 

1  -B 


P 


(T) 

t 


1  The  remainder  of  this  chapter  makes  use  of  the  backshift  operator  introduced  in  Appendix 
D  on  page  106.  You  may  want  to  review  that  appendix  before  proceeding  further. 
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Exhibit  11.3  Some  Common  Models  for  Step  Response  Interventions 
(All  are  shown  with  a  delay  of  1  time  unit) 


(a) 


to  BS 


(T) 

t 


(b) 


mB  JT) 

1-5 B  ‘ 


(Q B  JT) 
1-5  f 


co 
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T 


co/(l-5) 

CO 


0  9 _ _ e _ e _ 
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T 


0 


i - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 

T 


slope  =  co 


Several  specifications  can  be  combined  to  model  more  sophisticated  intervention 
effects. 

For  example, 


lB  (T)  ®2B  (T) 

1-5 B  1  1-5  ' 


(11.1.13) 


depicts  the  situation  displayed  in  Exhibit  1 1.4  (b)  where  co^  and  0)2  are  both  greater  than 
zero,  and 


m 


t 


®o  p\T)  + 


m\B  p(T)  m2B  (T) 
1-55  (  1-5  ' 


(11.1.14) 


may  model  situations  like  Exhibit  11.4  (c)  with  o>i  and  (02  both  negative.  This  last  case 
may  model  the  interesting  situation  where  a  special  sale  may  cause  strong  rush  buying, 
initially  so  much  so  that  the  sale  is  followed  by  depressed  demand.  More  generally,  we 
can  model  the  change  in  the  mean  function  by  an  ARMA-type  specification 


m  =  ^P(T) 
>  5(5)  f 


(11.1.15) 


( T ) 


JT) 


where  c o(5)  and  5(5)  are  some  polynomials  in  5.  Because  ( 1  —B)S\ 
model  for  mt  can  be  specified  in  terms  of  either  the  pulse  or  step  dummy  variable. 


the 
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Exhibit  1 1 .4  Some  Common  Models  for  Pulse  Response  Interventions 
(All  are  shown  with  a  delay  of  1  time  unit) 


(a) 


(0  B  (T) 
1-5 B  > 


(b) 


r  ff>  ,B 
_1  -5B  + 


1  -B_ 


(c) 


co()  + 


<j),B 
1-5  B  + 


co 

1  -B_ 


0 


co 


0 


(Oj  +  CO2 


co2 


I - 1 - 1 - 1 - 1 - 1 - 1 - 1 - 1 

T 


0 


®o 


1 - 1 - 1 - 1 - r 

T 


co  2 

CD)  +  032 


1 - 1 - 1 - 1 


Estimation  of  the  parameters  of  an  intervention  model  may  be  carried  out  by  the 
method  of  maximum  likelihood  estimation.  Indeed,  Y,  -  m,  is  a  seasonal  AR1MA  pro¬ 
cess  so  that  the  likelihood  function  equals  the  joint  pdf  of  Yt  -  mt,  t  =  1 ,  2, . . . ,  n,  which 
can  be  computed  by  methods  studied  in  Chapter  7  or  else  by  the  state  space  modeling 
methods  of  Appendix  H  on  page  222. 

We  now  revisit  the  monthly  passenger-airmiles  data.  Recall  that  the  terrorist  acts  in 
September  2001  had  lingering  depressing  effects  on  air  traffic.  The  intervention  may  be 
specified  as  an  AR(1)  process  with  the  pulse  input  at  September  2001.  But  the  unex¬ 
pected  turn  of  events  in  September  2001  had  a  strong  instantaneous  chilling  effect  on  air 
traffic.  Thus,  we  model  the  intervention  effect  (the  9/1 1  effect)  as 


m 


1 


®o  p\T)  + 


(°1  JT) 
1-co  -,B  1 


where  T  denotes  September  2001.  In  this  specification,  coq  +  coj  represents  the  instanta¬ 
neous  9/11  effect,  and,  for  k>  1,  co  |  ( 0)2)^  gives  the  9/11  effect  k  months  afterward.  It 
remains  to  specify  the  seasonal  ARIMA  structure  of  the  underlying  unperturbed  pro¬ 
cess.  Based  on  the  preintervention  data,  an  ARIMA(0,  l,l)x(0,l,0)p  model  was  tenta¬ 
tively  specified  for  the  unperturbed  process;  see  Exhibit  11.5. 
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Exhibit  11.5  Sample  ACF  for  (1  B)(1  B12)  Log(Air  Passenger  Miles)  Over 
the  Preintervention  Period 


10  20  30  40 


Lag 

>  acf(as.vector(diff(diff (window (log (airmiles)  ,  end=c (2001,8)), 
12 ) ) ) , lag . max=48 ) 


Model  diagnostics  of  the  fitted  model  suggested  that  a  seasonal  MA(1)  coefficient 
was  needed  and  the  existence  of  some  additive  outliers  occurring  in  December  1996, 
January  1997,  and  December  2002.  (Outliers  will  be  discussed  in  more  detail  later;  here 
additive  outliers  may  be  regarded  as  interventions  of  unknown  nature  that  have  a  pulse 
response  function.)  Hence,  the  model  is  specified  as  an  ARIMA(0,l,l)x(0,l,l)12  plus 
the  9/11  intervention  and  three  additive  outliers.  The  fitted  model  is  summarized  in 
Exhibit  11.6. 


Exhibit  1 1 .6  Estimation  of  Intervention  Model  for  Logarithms  of  Air  Miles 
(Standard  errors  are  shown  below  the  estimates) 


0 

0 

Dec96 

Jan97 

Dec02 

®0 

COj 

®2 

0.383 

0.650 

0.099 

-0.069 

0.081 

-0.095 

-0.27 

0.814 

(0.093) 

(0.119) 

(0.023) 

(0.022) 

(0.020) 

(0.046) 

(0.044) 

(0.098) 

a2  estimated  as  0.000672:  log-likelihood  =  219.99,  AIC=  -423.98 

>  air . ml=arimax (log (airmiles)  ,  order=c (0,1,1)  , 

seasonal = list ( order =c (0,1,1) , period=12 ) , 
xtransf =data . frame (1911=1* ( seq (airmiles ) ==69 ) , 

1911  =  1* (seq (airmiles) =  =  6  9) ) , transf er=list ( c ( 0 , 0 )  , c  ( 1 , 0) ) , 
xreg=data . frame (Dec96=l* (seq (airmiles ) ==12 ) , 

Jan97=l* ( seq (airmiles) ==13) ,Dec02=l*(seq (airmiles ) ==84 ) ) , 
method= 1  ML ' ) 

>  air. ml 


256 


Time  Series  Regression  Models 


Model  diagnostics  suggested  that  the  fitted  model  above  provides  a  good  fit  to  the 
data.  The  open  circles  in  the  time  series  plot  shown  in  Exhibit  11.7  represent  the  fitted 
values  from  the  final  estimated  model.  They  indicate  generally  good  agreement  between 
the  model  and  the  data. 


Exhibit  11.7  Logs  of  Air  Passenger  Miles  and  Fitted  Values 


Time 

>  plot ( log (airmiles ) , ylab= 1  Log (airmiles)  '  ) 

>  points ( fitted (air . ml ) ) 


The  fitted  model  estimates  that  the  9/11  intervention  reduced  air  traffic  by  31%  = 
{ 1  -  exp(-0. 0949-0. 2715)}  x  100%  in  September  2001,  and  air  traffic  k  months  later 
was  lowered  by  { 1  -  exp(-0.2715x0.8139^  )}xl00%.  Exhibit  1 1.8  graphs  the  estimated 
9/1 1  effects  on  air  traffic,  which  indicate  that  air  traffic  regained  its  losses  toward  the 
end  of  2003. 


Exhibit  1 1 .8  The  Estimated  9/1 1  Effects  for  the  Air  Passenger  Series 


1996  1998  2000  2002  2004 


Time 


>  Ninellp=l* (seq (airmiles) ==69) 

>  plot (ts (Ninellp* (-0 . 0949) + 
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filter (Ninellp, f ilter= . 8139 , method= ' recursive ' ,  side=l) * 
(-0.2715) , f requency=12 , start=1996) , ylab= '9/11  Effects', 
type='h');  abline(h=0) 


11.2  Outliers 


Outliers  refer  to  atypical  observations  that  may  arise  because  of  measurement  and/or 
copying  errors  or  because  of  abrupt,  short-term  changes  in  the  underlying  process.  For 
time  series,  two  kinds  of  outliers  can  be  distinguished,  namely  additive  outliers  and 
innovative  outliers.  These  two  kinds  of  outliers  are  often  abbreviated  as  AO  and  10, 
respectively.  An  additive  outlier  occurs  at  time  T  if  the  underlying  process  is  perturbed 
additively  at  time  T  so  that  the  data  equal 

Y't=  Yt  +  co^  (11.2.1) 

where  {Tf}  is  the  unperturbed  process.  Henceforth  in  this  section,  Y  denotes  the 
observed  process  that  may  be  affected  by  some  outliers  and  Y  the  unperturbed  process 
should  there  be  no  outliers.  Thus,  YT  =  Yf  +  wA  but  Yf  =  Yf  otherwise,  so  the  time 
series  is  only  affected  at  time  T  if  it  has  an  additive  outlier  at  T.  An  additive  outlier  can 
also  be  treated  as  an  intervention  that  has  a  pulse  response  at  T  so  that  mf  =  co^  Pi  T> . 

On  the  other  hand,  an  innovative  outlier  occurs  at  time  t  if  the  error  (also  known  as 
an  innovation)  at  time  t  is  perturbed  (that  is,  the  errors  equal  e't  =  e(  +  to jP^  ,  where  et 
is  a  zero-mean  white  noise  process).  So,  e'T  =  eT+  co7  but  e't  =  et  otherwise.  Suppose 
that  the  unperturbed  process  is  stationary  and  admits  an  MA(oo)  representation 

Yt  =  et  +  \ylet_i+\V2et_2+--- 

Consequently,  the  perturbed  process  can  be  written 

Y'f  =  e't  +  y1e't_]+y2e't_2+  ■■■ 

=  [e,  +  V1*/_1  +  V2et_2+"-]  +  Vf_rC“  / 

or 

Y’t=Yt  +  yt_T  co7  (11.2.2) 

where  vj/q  =  1  and  Wj  =  0  f°r  negative  j.  Thus,  an  innovative  outlier  at  T  perturbs  all 
observations  on  and  after  T,  although  with  diminishing  effect,  as  the  observation  is  fur¬ 
ther  away  from  the  origin  of  the  outlier. 

To  detect  whether  an  observation  is  an  AO  or  10,  we  use  the  AR(oo)  representation 
of  the  unperturbed  process  to  define  the  residuals: 

«t  =  Y't-niY't-i-n2Y't-2--  (n.2.3) 

For  simplicity,  we  assume  the  process  has  zero  mean  and  that  the  parameters  are  known. 
In  practice,  the  unknown  parameter  values  are  replaced  by  their  estimates  from  the  pos¬ 
sibly  perturbed  data.  Under  the  null  hypothesis  of  no  outliers  and  for  large  samples,  this 
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has  a  negligible  effect  on  the  properties  of  the  test  procedures  described  below.  If  the 
series  has  exactly  one  IO  at  time  T,  then  the  residual  aT=  to/  +  ej  but  at  =  e,  otherwise. 
So  (0/  can  be  estimated  by  co/  =  aT  with  variance  equal  to  a2.  Thus,  a  test  statistic  for 
testing  for  an  10  at  T  is 

d  rp 

X,  T  =  —  (11.2.4) 

CT 

which  has  (approximately)  a  standard  normal  distribution  under  the  null  hypothesis  that 
there  are  no  outliers  in  the  time  series.  When  T  is  known  beforehand,  the  observation  in 
question  is  declared  an  outlier  if  the  corresponding  standardized  residual  exceeds  1.96 
in  magnitude  at  the  5%  significance  level.  In  practice,  there  is  often  no  prior  knowledge 
about  T,  and  the  test  is  applied  to  all  observations.  In  addition,  a  will  need  to  be  esti¬ 
mated.  A  simple  conservative  procedure  is  to  use  the  Bonferroni  rule  for  controlling  the 
overall  error  rate  of  multiple  tests.  Let 

Lj  =  max1<f<„|Llr|  (11.2.5) 

be  attained  at  t  =  T.  Then  the  7  th  observation  is  deemed  an  10  if  A,j  exceeds  the  upper 
0.025 Inx  100  percentile  of  the  standard  normal  distribution.  This  procedure  guarantees 
that  there  is  at  most  a  5%  probability  of  a  false  detection  of  an  IO.  Note  that  an  outlier 
will  inflate  the  maximum  likelihood  estimate  of  a,  so  if  there  is  no  adjustment  for  outli¬ 
ers,  the  power  of  most  tests  is  usually  reduced.  A  robust  estimate  of  the  noise  standard 
deviation  may  be  used  in  lieu  of  the  maximum  likelihood  estimate  to  increase  the  power 
of  the  test.  For  example,  a  can  be  more  robustly  estimated  by  the  mean  absolute  residual 
times  J2/ n . 

The  detection  of  an  AO  is  more  complex.  Suppose  that  the  process  admits  an  AO  at 
T  and  is  otherwise  free  of  outliers.  Then  it  can  be  shown  that 

at  =  -  (oAntT+  et  (11.2.6) 

where  ji0  =  -1  and  n j  =  0  for  negative  j.  Hence,  at  =  et  for  t  <  T,  aT  =  a>A  +  eT, 
<7r+1  =  -(O4 it i  +  eT+i,  aT+2  =  -to An2  +  ej+2,  and  so  forth.  A  least  squares  estimator  of  a>A 
is 

2  n 

®T,A  =  ~P~  X  nt-Tat  (11.2.7) 

t  =  1 

2  2  2  t  _i 

where  p”  =  (l +  nl  +  n^+---+n  T)  ,  with  the  variance  of  the  estimate  being 
equal  to  p2a2.  We  can  then  define 

CO'T’  A 

X0  T  =  —^2  (11.2.8) 

pa 

as  the  test  statistic  for  testing  the  null  hypothesis  that  the  time  series  has  no  outliers  ver¬ 
sus  the  alternative  hypothesis  of  an  AO  at  T.  As  before,  p  and  a  will  need  to  be  esti¬ 
mated.  The  test  statistic  X2t  is  approximately  distributed  as  1V(0,1)  under  the  null 
hypothesis.  Again,  T  is  often  unknown,  and  the  test  is  applied  repeatedly  to  each  time 
point.  The  Bonferroni  rule  may  again  be  applied  to  control  the  overall  error  rate.  Fur¬ 
thermore,  the  nature  of  an  outlier  is  not  known  beforehand.  In  the  case  where  an  outlier 
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is  detected  at  T,  it  may  be  classified  to  be  an  IO  if  |^i  7]  >  |  ^2  l\  and  an  AO  otherwise. 
See  Chang  et  al.  (1988)  for  another  approach  to  classifying  the  nature  of  an  outlier. 
When  an  outlier  is  found,  it  can  be  incorporated  into  the  model,  and  the  outlier-detection 
procedure  can  then  be  repeated  with  the  refined  model  until  no  more  outliers  are  found. 

As  a  first  example,  we  simulated  a  time  series  of  length  n  =  100  from  the 
ARIMA(  1,0,1)  model  with  (|)  =  0.8  and  9  =  -0.5.  We  then  changed  the  10th  observation 
from  -2.13  to  10  (that  is,  (04  =  12.13);  see  Exhibit  11.9.  Based  on  the  sample  ACF, 
PACF  and  EACF,  an  AR(1)  model  was  tentatively  identified.  Based  on  the  Bonferroni 
rule,  the  9th,  10th,  and  11th  observations  were  found  to  be  possible  additive  outliers 
with  the  corresponding  robustified  test  statistics  being  -3.54,  9.55,  and  -5.20.  The  test 
for  IO  revealed  that  the  10th  and  11th  observations  may  be  IO,  with  the  corresponding 
robustified  test  statistics  being  7.11  and  -6.64.  Because  among  the  tests  for  AO  and  IO 
the  largest  magnitude  occurs  for  the  test  for  AO  at  T  =  10,  the  10th  observation  was  ten¬ 
tatively  marked  as  an  AO.  Note  that  the  nonrobustified  test  statistic  for  AO  at  T  =  10 
equals  7.49,  which  is  substantially  less  than  the  more  robust  test  value  of  9.55,  showing 
that  robustifying  the  estimate  of  the  noise  standard  deviation  does  increase  the  power  of 
the  test.  After  incorporating  the  AO  in  the  model,  no  more  outliers  were  found.  How¬ 
ever,  the  lag  1  residual  ACF  was  significant,  suggesting  the  need  for  an  MA(  1 )  compo¬ 
nent.  Hence,  an  ARIMA(  1,0,1)  +  AO  at  7'  =  1 0  model  was  fitted  to  the  data.  This  model 
was  found  to  have  no  additional  outliers  and  passed  all  model  diagnostic  checks. 


Exhibit  11.9  Simulated  ARIMA(1,0,1)  Process  with  an  Additive  Outlier 


Time 

>  The  extensive  R  code  for  the  simulation  and  analysis  of  this 

example  may  be  found  in  the  R  code  script  file  for  Chapter  11. 

For  a  real  example,  we  return  to  the  seasonal  ARIMA(0, 1 , 1  )x (0, 1 , 1 )  12  model  that 
we  fitted  to  the  carbon  dioxide  time  series  in  Chapter  10.  The  time  series  plot  of  the 
standardized  residuals  from  this  model,  shown  in  Exhibit  10.1 1  on  page  238,  showed  a 
suspiciously  large  standardized  residual  in  September  1998.  Calculation  shows  that 
there  is  no  evidence  of  an  additive  outlier,  as  7-2.  t  is  not  significantly  large  for  any  t. 
However,  the  robustified  =  maX| <r<n|h]  r|=  3.7527 ,  which  is  attained  at  t  =  57,  cor- 
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responding  to  September  1998.  The  Bonferroni  critical  value  with  a  =  5%  and  n  =  132 
is  3.5544.  So  our  observed  is  large  enough  to  claim  significance  for  an  innovation 
outlier  in  September  1998.  Exhibit  11.10  shows  the  results  of  fitting  the  ARIMA(0,1,1) 
x(0,l,l)i2  model  with  an  IO  at  t  =  57  to  the  CCE  time  series.  These  results  should  be 
compared  with  the  earlier  results  shown  in  Exhibit  10.10  on  page  237,  where  the  outlier 
was  not  taken  into  account.  Notice  that  the  estimates  of  0  and  ©  have  not  changed  very 
much,  the  AIC  is  better  (that  is,  smaller),  and  the  IO  effect  is  highly  significant.  Diag¬ 
nostics  based  on  this  model  turn  out  to  be  excellent,  no  further  outliers  are  detected,  and 
we  have  a  very  adequate  model  for  this  seasonal  time  series. 


Exhibit  11.10  ARIMA(0,1,1)x(0,1,1)i2  Model  with  IO  at  t  =  57  for  C02  Series 


Coefficient 

0 

© 

10-57 

Estimate 

0.5925 

0.8274 

2.6770 

Standard  Error 

0.0775 

0.1016 

0.7246 

be  =  0.4869:  log-likelihood  =  -133.08, 

AIC  =  272.16 

>  ml . co2=arima (co2 , order =c (0,1,1) , seasonal=list (order=c (0,1,1) , 

period=12));  ml . co2 

>  detectAO (ml . co2 ) ;  detectIO (ml . co2 ) 

>  m4 . co2=arimax (co2 , order=c (0,1,1) , seasonal =1 is t ( order =c (0,1,1) , 

period=12 ) , io=c ( 57 ) ) ;  m4 . co2 


11.3  Spurious  Correlation 


A  main  purpose  of  building  a  time  series  model  is  for  forecasting,  and  the  ARIMA 
model  does  this  by  exploiting  the  autocorrelation  pattern  in  the  data.  Often,  the  time 
series  under  study  may  be  related  to,  or  led  by,  some  other  covariate  time  series.  For 
example,  Stige  et  al.  (2006)  found  that  pasture  production  in  Africa  is  generally  related 
to  some  climatic  indices.  In  such  cases,  better  understanding  of  the  underlying  process 
and/or  more  accurate  forecasts  may  be  achieved  by  incorporating  relevant  covariates 
into  the  time  series  model. 

Let  Y  =  |  Yt  \  be  the  time  series  of  the  response  variable  and  X  =  |  A,  }  be  a  covariate 
time  series  that  we  hope  will  help  explain  or  forecast  Y.  To  explore  the  correlation  struc¬ 
ture  between  X  and  Y  and  their  lead-led  relationship,  we  define  the  cross-covariance 
function  y t  S(X,Y)  =  Cov(Xt,Ys)  for  each  pair  of  integers  t  and  .y.  Stationarity  of  a  univari¬ 
ate  time  series  can  be  easily  extended  to  the  case  of  multivariate  time  series.  For  exam¬ 
ple,  X  and  Y  are  jointly  (weakly)  stationary  if  their  means  are  constant  and  the 
covariance  yt  S(X,  Y)  is  a  function  of  the  time  difference  t  -  s.  For  jointly  stationary  pro¬ 
cesses,  the  cross-correlation  function  between  X  and  Y  at  lag  k  can  then  be  defined  by 
p k(X,Y)  =  Corr(Xt,Yt  _  k)  =  Corr(Xt  +  k,Yt).  Note  that  if  Y  =  X,  the  cross-correlation 
becomes  the  autocorrelation  of  Y  at  lag  k.  The  coefficient  p(j(Y,X)  measures  the  contem¬ 
poraneous  linear  association  between  X  and  Y,  whereas  pj£X,  Y)  measures  the  linear 
association  between  Xt  and  that  of  Yt  _  k.  Recall  that  the  autocorrelation  function  is  an 
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even  function,  that  is,  p/{(  Y,  Y)  =  p_k(Y,Y).  (This  is  because  Coni  Yt,  Y,  _  k)  = 
Corr(Yt_ic,Y t)  =  Corr(Yt,Yt  +  k),  by  stationarity. )  However,  the  cross-correlation  function 
is  generally  not  an  even  function  since  ConiXt,  Yt  _  k)  need  not  equal  Corr(Xl,Yl  +  k). 

As  an  illustration,  consider  the  regression  model 

Yt=  (50  +  (51Zr_rf  +  et  (H.3.1) 


where  the  X’s  are  independent,  identically  distributed  random  variables  with  variance 
cr  y  and  the  e’s  are  also  white  noise  with  variance  o~  and  are  independent  of  the  XX.  It 
can  be  checked  that  the  cross-correlation  function  (CCF)  p k(X,Y)  is  identically  zero 
except  for  lag  k  =  -d,  where 


P-/X,  Y) 


(11.3.2) 


In  this  case,  the  theoretical  CCF  is  nonzero  only  at  lag  -d,  reflecting  the  fact  that  X  is 
“leading”  Y  by  d  units  of  time.  The  CCF  can  be  estimated  by  the  sample  cross-correla¬ 
tion  function  (sample  CCF)  defined  by 


Y(X.-X)(Y.  ,-Y) 
rk(X,  Y)  =  ^  '  >-k 


(11.3.3) 


where  the  summations  are  done  over  all  data  where  the  summands  are  available.  The 
sample  CCF  becomes  the  sample  ACF  when  Y  =  X.  The  covariate  X  is  independent  of  Y 
if  and  only  if  (3  j  =  0,  in  which  case  the  sample  autocorrelation  rk(X,  Y)  is  approximately 
normally  distributed  with  zero  mean  and  variance  1  In,  where  n  is  the  sample  size — the 
number  of  pairs  of  ( Xt,Yt )  available.  Sample  cross-correlations  that  are  larger  than 

I . 96/  Jti  in  magnitude  are  then  deemed  significantly  different  from  zero. 

We  have  simulated  100  pairs  of  (Xt,Yt)  from  the  model  of  Equation  (11.3.1)  with  d 
=  2,  Po  =  0.  and  Pi  =  1.  The  X’s  and  e’s  are  generated  as  normal  random  variables  dis¬ 
tributed  as  N(0, 1)  and  (V(0,0.25),  respectively.  Theoretically,  the  CCF  should  then  be 
zero  except  at  lag  -2,  where  it  equals  p_2(X,  Y)  =  l/Jl  +  0.25  =  0.8944.  Exhibit 

II. 11  shows  the  sample  CCF  of  the  simulated  data,  which  is  significant  at  lags  -2  and  3. 
But  the  sample  CCF  at  lag  3  is  quite  small  and  only  marginally  significant.  Such  a  false 
alarm  is  not  unexpected  as  the  exhibit  displays  a  total  of  33  sample  CCF  values  out  of 
which  we  may  expect  33x0.05  =  1.65  false  alarms  on  average. 
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Exhibit  11.11  Sample  Cross-Correlation  from  Equation  (11.3.1)  with  d=  2 


>  win . graph (width=4 . 875 , height =2 . 5 , pointsize=8 ) 

>  set . seed ( 12345 ) ;  X=rnorm ( 105 ) ;  Y=zlag (X, 2 ) + . 5*rnorm ( 105 ) 

>  X=ts (X [- (1 : 5) ] , start=l , freq=l) ;  Y=ts(Y[-(l:5)], start=l , freq=l) 

>  ccf (X,Y,ylab= 1 CCF ' ) 


Even  though  Xt_  2  correlates  with  Yt,  the  regression  model  considered  above  is 
rather  restrictive,  as  X  and  Y  are  each  white  noise  series.  For  stationary  time  series,  the 
response  variable  and  the  covariate  are  each  generally  autocorrelated,  and  the  error  term 
of  the  regression  model  is  also  generally  autocorrelated.  Hence  a  more  useful  regression 
model  is  given  by 

Yt=  P()+Pl  Xt-d  +  Zt  (H-3.4) 

where  Zt  may  follow  some  ARIMA(p,r/,g)  model.  Even  if  the  processes  X  and  Y  are 
independent  of  each  other  (Pj  =  0),  the  autocorrelations  in  Y  and  X  have  the  unfortunate 
consequence  of  implying  that  the  sample  CCF  is  no  longer  approximately  N(0,l/n). 
Under  the  assumption  that  both  X  and  Y  are  stationary  and  that  they  are  independent  of 
each  other,  it  turns  out  that  the  sample  variance  tends  to  be  different  from  1  In.  Indeed,  it 
may  be  shown  that  the  variance  of  *fnrk(X,  Y)  is  approximately 

00 

1+2  £  p,(X)p,(F)  (11-3.5) 

k  =  1 

where  p/;(A)  is  the  autocorrelation  of  X  at  lag  k  and  pk(  Y)  is  similarly  defined  for  the 
T-process.  For  refinement  of  this  asymptotic  result,  see  Box  et  al.  (1994,  p.  413).  Sup¬ 
pose  X  and  Y  are  both  AR(1)  processes  with  AR(  1)  coefficients  <f)y  and  (|)y,  respectively. 
Then  rk(X.  Y)  is  approximately  normally  distributed  with  zero  mean,  but  the  variance  is 
now  approximately  equal  to 

1  +  (|)x(|>y 

n(l  -  y) 


(11.3.6) 
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When  both  AR(1)  coefficients  are  close  to  1,  the  ratio  of  the  sampling  variance  of 
r^(X,Y)  to  the  nominal  value  of  l/n  approaches  infinity.  Thus,  the  unquestioned  use  of 
the  1/n  rule  in  deciding  the  significance  of  the  sample  CCF  may  lead  to  many  more  false 
positives  than  the  nominal  5%  error  rate,  even  though  the  response  and  covariate  time 
series  are  independent  of  each  other.  Exhibit  11.12  shows  some  numerical  results  for  the 
case  where  <l>jf  =  4>y  =  <l)- 


Exhibit  11.12  Asymptotic  Error  Rates  of  a  Nominal  5%  Test  of 
Independence  for  a  Pair  of  AR(1)  Processes 

(j)  =  (l)x=(|)F  0.00  0.15  0.30  0.45  0.60  0.75  0.90 

Error  Rate  5%  6%  7%  11%  18%  30%  53% 

>  phi=seq(0, . 95, . 15) 

>  re j ection=2* ( 1-pnorm (1 . 96*sqrt ( (l-phi^2 ) / (l+phiA2) ) ) ) 

>  M=signif (rbind (phi , re j ection) ,2) 

>  rownames (M) =c ( ' phi  1 ,  'Error  Rate') 

>  M 


The  problem  of  inflated  variance  of  the  sample  cross-correlation  coefficients 
becomes  more  acute  for  nonstationary  data.  In  fact,  the  sample  cross-correlation  coeffi¬ 
cients  may  no  longer  be  approximately  normally  distributed  even  with  a  large  sample 
size.  Exhibit  11.13  displays  the  histogram  of  1000  simulated  lag  zero  cross-correlations 
between  two  independent  IMA(1,1)  processes  each  of  size  500.  An  MA(1)  coefficient 
of  0  =  0.8  was  used  for  both  simulated  processes.  Note  that  the  distribution  of  rtfX,  Y)  is 
far  from  normal  and  widely  dispersed  between  -1  and  1.  See  Phillips  (1998)  for  a  rele¬ 
vant  theoretical  discussion. 


Exhibit  11.13  Histogram  of  1000  Sample  Lag  Zero  Cross-Correlations  of 
Two  Independent  IMA(1,1)  Processes  Each  of  Size  500 


-1.0  -0.5  0.0  0.5  1.0 

ro(X,  Y) 
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>  set . seed (23457) 

>  correlation . v=NULL ;  B=1000;  n=500 

>  for  (i  in  1:B)  {x=cumsum (arima . sim (model=list (ma= . 8 ) , n=n) ) 

>  y=cumsum (arima . sim (model = list (ma= . 8 ) , n=n) ) 

>  correlation . v=c ( correlation . v, ccf (x, y, lag .max=l , 

plot  =  F) $acf  [2] )  } 

>  hist (correlation . v, prob=T, xlab=expression (r [0] (X,Y))) 


These  results  provide  insight  into  why  we  sometimes  obtain  nonsense  (spurious) 
correlation  between  time  series  variables.  The  phenomenon  of  spurious  correlation  was 
first  studied  systematically  by  Yule  (1926). 

As  an  example,  the  monthly  milk  production  and  the  logarithms  of  monthly  elec¬ 
tricity  production  in  the  United  States  from  January  1994  to  December  2005  are  shown 
in  Exhibit  11.14.  Both  series  have  an  upward  trend  and  are  highly  seasonal. 


Exhibit  11.14  Monthly  Milk  Production  and  Logarithms  of  Monthly 
Electricity  Production  in  the  U.S. 


Time 


>  data (milk);  data (electricity) 

>  milk . elect ricity=ts . intersect (milk, log (electricity) ) 

>  plot (milk . electricity , yax . f lip=T) 


Calculation  shows  that  these  series  have  a  cross-correlation  coefficient  at  lag  zero 
of  0.54,  which  is  “statistically  significantly  different  from  zero”  as  judged  against  the 
standard  error  criterion  of  1 .96/ Jn  =  0.16.  Exhibit  11.15  displays  the  strong  cross- 
correlations  between  these  two  variables  at  a  large  number  of  lags. 

Needless  to  say,  it  is  difficult  to  come  up  with  a  plausible  reason  for  the  relationship 
between  monthly  electricity  production  and  monthly  milk  production.  The  nonstationar- 
ity  in  the  milk  production  series  and  in  the  electricity  series  is  more  likely  the  cause  of 
the  spurious  correlations  found  between  the  two  series.  The  following  section  contains 
further  discussion  of  this  example. 
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Exhibit  11.15  Sample  Cross-Correlation  Between  Monthly  Milk  Production 
and  Logarithm  of  Monthly  Electricity  Production  in  the  U.S. 


Lag 

>  ccf (as .vector (milk . electricity [ , 1] ) , 

as .vector (milk . electricity [ , 2] ) , ylab= ' CCF ' ) 


11.4  Prewhitening  and  Stochastic  Regression 


In  the  preceding  section,  we  found  that  with  strongly  autocorrelated  data  it  is  difficult  to 
assess  the  dependence  between  the  two  processes.  Thus,  it  is  pertinent  to  disentangle  the 
linear  association  between  X  and  Y,  say,  from  their  autocorrelation.  A  useful  device  for 
doing  this  is  prewhitening.  Recall  that,  for  the  case  of  stationary  X  and  Y  that  are  inde¬ 
pendent  of  each  other,  the  variance  of  r,  (X,  Y)  is  approximately 


00 

1  +  2  Z 

k  =  1 


(11.4.1) 


An  examination  of  this  formula  reveals  that  the  approximate  variance  is  1  hi  if  either  one 
(or  both)  of  X  or  Y  is  a  white  noise  process.  In  practice,  the  data  may  be  nonstationary, 
but  they  may  be  transformed  to  approximately  white  noise  by  replacing  the  data  by  the 
residuals  from  a  fitted  ARIMA  model.  For  example,  if  X  follows  an  ARIMA(  1,1,0) 
model  with  no  intercept  term,  then 

Xt  =  Xt-Xt_l -§(Xt  _j -X{_2)  =  l-(l+())R)  +  (|)R2]Af  (11.4.2) 


is  white  noise.  More  generally,  if  Xt  follows  some  invertible  ARIMA(p,<i,  q)  model,  then 
it  admits  an  AR(oo)  representation 

X,  =  (1  -%xB-n2B2-  —)Xt  =  n(B)Xt 

where  the  X  ’s  are  white  noise.  The  process  of  transforming  the  X’s  to  the  X  ’s  via  the  fil¬ 
ter  n(B)  =  1  -  K\B  —  n 2B2 - is  known  as  whitening  or  prewhitening.  We  now  can 
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study  the  CCF  between  X  and  Y  by  prewhitening  the  Y  and  X  using  the  same  filter  based 
on  the  X  process  and  then  computing  the  CCF  of  Y  and  X ;  that  is,  the  prewhitened  Y 
and  X.  Since  prewhitening  is  a  linear  operation,  any  linear  relationships  between  the 
original  series  will  be  preserved  after  prewhitening.  Note  that  we  have  abused  the  termi¬ 
nology,  as  Y  need  not  be  white  noise  because  the  filter  n(B)  is  tailor-made  only  to  trans¬ 
form  X  to  a  white  noise  process — not  Y.  We  assume,  furthermore,  that  Y  is  stationary. 
This  approach  has  two  advantages:  (i)  the  statistical  significance  of  the  sample  CCF  of 
the  prewhitened  data  can  be  assessed  using  the  cutoff  1.96 /Jn  ,  and  (ii)  the  theoretical 
counterpart  of  the  CCF  so  estimated  is  proportional  to  certain  regression  coefficients. 

To  see  (ii),  consider  a  more  general  regression  model  relating  X  to  Y  and,  without 
loss  of  generality,  assume  both  processes  have  zero  mean: 

00 

Yt  =  Z  P jX,-j  +  Z,  d1-4-3) 

j  =  -cc 

where  X  is  independent  of  Z  and  the  coefficients  p  are  such  that  the  process  is 
well-defined.  In  this  model,  the  coefficients  (:S /.  could  be  nonzero  for  any  integer  k.  How¬ 
ever,  in  real  applications,  the  doubly  infinite  sum  is  often  a  finite  sum  so  that  the  model 
simplifies  to 

m2 

Yt=  Z  ^jXt-j  +  Zv  (11.4.4) 

j  = 

which  will  be  assumed  below  even  though  we  retain  the  doubly  infinite  summation 
notation  for  ease  of  exposition.  If  the  summation  ranges  only  over  a  finite  set  of  positive 
indices,  then  X  leads  Y  and  the  covariate  X  serves  as  a  useful  leading  indicator  for 
future  T’s.  Applying  the  filter  n{B)  to  both  sides  of  this  model,  we  get 

~  00  _ 

Yt=  £  P *X,_*  +  Z,  (11.4.5) 

k  =  -oo 

where  Zt  =  Zt  -  n  i  Zt  _  j  -  jt9Zr  _  9  -  •  •  •  .The  prewhitening  procedure  thus  orthogonal- 
izes  the  various  lags  of  X  in  the  original  regression  model.  Because  X  is  a  white  noise 
sequence  and  X  is  independent  of  Z,  the  theoretical  cross-correlation  coefficient 
between  X  and  Y  at  lag  k  equals  P  ^ct^/ct-) '  o4lcr  worcls,  the  theoretical  cross¬ 
correlation  of  the  prewhitened  processes  at  lag  k  is  proportional  to  the  regression  coeffi¬ 
cient  P  _k. 

For  a  quick  preliminary  analysis,  an  approximate  prewhitening  can  be  done  easily 
by  first  differencing  the  data  (if  needed)  and  then  fitting  an  approximate  AR  model  with 
the  order  determined  by  minimizing  the  AIC.  For  example,  for  the  milk  production  and 
electricity  consumption  data,  both  are  highly  seasonal  and  contain  trends.  Consequently, 
they  can  be  differenced  with  both  regular  differencing  and  seasonal  differencing,  and 
then  the  prewhitening  can  be  carried  out  by  filtering  both  differenced  series  by  an  AR 
model  fitted  to  the  differenced  milk  data.  Exhibit  11.16  shows  the  sample  CCF  between 
the  prewhitened  series.  None  of  the  cross-correlations  are  now  significant  except  for  lag 
-3,  which  is  just  marginally  significant.  The  lone  significant  cross-correlation  is  likely  a 
false  alarm  since  we  expect  about  1.75  false  alarms  out  of  the  35  sample  cross-correla- 
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tions  examined.  Thus,  it  seems  that  milk  production  and  electricity  consumption  are  in 
fact  largely  uncorrelated,  and  the  strong  cross-correlation  pattern  found  between  the  raw 
data  series  is  indeed  spurious. 


Exhibit  11.16  Sample  CCF  of  Prewhitened  Milk  and  Electricity  Production 


-15  -10  -5  0  5  10  15 

Lag 

>  me . dif =ts . intersect (diff (diff (milk, 12) ) , 

diff (diff (log (electricity) , 12) ) ) 

>  prewhiten (as .vector (me . dif [ , 1] ) , as .vector (me . dif [ , 2] ) , 

ylab= ' CCF' ) 


The  model  defined  by  Equation  (11.3.4)  on  page  262  is  known  variously  as  the 
transfer-function  model,  the  distributed-lag  model,  or  the  dynamic  regression  model. 
The  specification  of  which  lags  of  the  covariate  enter  into  the  model  is  often  done  by 
inspecting  the  sample  cross-correlation  function  based  on  the  prewhitened  data.  When 
the  model  appears  to  require  a  fair  number  of  lags  of  the  covariate,  the  regression  coeffi¬ 
cients  may  be  parsimoniously  specified  via  an  ARMA  specification  similar  to  the  case 
of  intervention  analysis;  see  Box  et  al.  (1994,  Chapter  1 1)  for  some  details.  We  illustrate 
the  method  below  with  two  examples  where  only  one  lag  of  the  covariate  appears  to  be 
needed.  The  specification  of  the  stochastic  noise  process  Z(  can  be  done  by  examining 
the  residuals  from  an  ordinary  least  squares  (OLS)  fit  of  Y  on  X  using  the  techniques 
learned  in  earlier  chapters. 

Our  first  example  of  this  section  is  a  sales  and  price  dataset  of  a  certain  potato  chip 
from  Bluebird  Foods  Ltd.,  New  Zealand.  The  data  consist  of  the  log-transformed 
weekly  unit  sales  of  large  packages  of  standard  potato  chips  sold  and  the  weekly  aver¬ 
age  price  over  a  period  of  104  weeks  from  September  20,  1998  through  September  10, 
2000;  see  Exhibit  11.17.  The  logarithmic  transformation  is  needed  because  the  sales 
data  are  highly  skewed  to  the  right.  These  data  are  clearly  nonstationary.  Exhibit  11.18 
shows  that,  after  differencing  and  using  prewhitened  data,  the  CCF  is  significant  only  at 
lag  0,  suggesting  a  strong  contemporaneous  negative  relationship  between  lag  1  of  price 
and  sales.  Higher  prices  are  associated  with  lower  sales. 
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Exhibit  11.17  Weekly  Log(Sales)  and  Price  for  Bluebird  Potato  Chips 


Time 

>  data (bluebird) 

>  plot (bluebird, yax. flip=T) 


Exhibit  11.18  Sample  Cross  Correlation  Between  Prewhitened  Differenced 
Log(Sales)  and  Price  of  Bluebird  Potato  Chips 


-15  -10  -5  0  5  10  15 

Lag 


>  prewhiten (y=diff (bluebird) [, 1] , x=diff (bluebird) [ , 2] , ylab= 1 CCF ' ) 


Exhibit  11.19  reports  the  estimates  from  the  OLS  regression  of  log(sales)  on  price. 
The  residuals  are,  however,  autocorrelated,  as  can  be  seen  from  their  sample  ACF  and 
PACF  displayed  in  Exhibits  11.20  and  11.21,  respectively.  Indeed,  the  sample  autocor¬ 
relations  of  the  residuals  are  significant  for  the  first  four  lags,  whereas  the  sample  partial 
autocorrelations  are  significant  at  lags  1,  2,  4,  and  14. 
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Exhibit  11.19  OLS  Regression  Estimates  of  Log(Sales)  on  Price 


Estimate 

Std.  Error 

lvalue 

Pr{>) 

Intercept 

15.90 

0.2170 

73.22 

<  0.0001 

Price 

-2.489 

0.1260 

-19.75 

<  0.0001 

>  sales=bluebird [ , 1] ;  price=bluebird [ , 2] 

>  chip . ml =lm (sales -price, data=bluebird) 

>  summary (chip . ml ) 


Exhibit  11.20  Sample  ACF  of  Residuals  from  OLS  Regression  of 
Log(Sales)  on  Price 


Lag 

>  acf (residuals (chip .ml) , ci . type= 'ma ' ) 


Exhibit  11.21  Sample  PACF  of  Residuals  from  OLS  Regression  of 
Log(Sales)  on  Price 


Lag 

>  pacf (residuals (chip. ml) ) 
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The  sample  EACF  of  the  residuals,  shown  in  Exhibit  1 1 .22,  contains  a  triangle  of 
zeros  with  a  vertex  at  (1,4),  thereby  suggesting  an  ARMA(1,4)  model.  Hence,  we  fit  a 
regression  model  of  log(sales)  on  price  with  an  ARMA(1,4)  error. 


Exhibit  11.22  The  Sample  EACF  of  the  Residuals  from  the  OLS 
Regression  of  Log(Sales)  on  Price 
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>  eacf (residuals (chip .ml) ) 
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It  turns  out  that  the  estimates  of  the  AR(1)  coefficient  and  the  MA  coefficients  0 1 
and  03  are  not  significant,  and  hence  a  model  fixing  these  coefficients  to  be  zero  was 
subsequently  fitted  and  reported  in  Exhibit  1 1.23. 


Exhibit  11.23  Maximum  Likelihood  Estimates  of  a  Regression  Model  of 
Log(sales)  on  Price  with  a  Subset  MA(4)  for  the  Errors 


Parameter 

6i 

02 

03 

04 

Intercept 

Price 

Estimate 

0 

-0.2884 

0 

-0.5416 

15.86 

-2.468 

Standard  Error 

0 

0.0794 

0 

0  0.1167 

0.1909 

0.1100 

a2  estimated  as  0.02623:  log  likelihood 

=  41.02,  AIC  = -70.05 

>  chip . m2=arima (sales , order=c (1,0,4) , xreg=data . frame (price) ) 

>  chip. m2 

>  chip . m3=arima (sales , order=c (1,0,4) , xreg=data . frame (price) , 

f ixed=c (NA, 0,NA, 0 , NA, NA, NA) ) ;  chip. m3 

>  chip . m4=arima (sales , order=c (0,0,4) , xreg=data . frame (price) , 

f ixed=c ( 0 , NA, 0 , NA, NA, NA) ) ;  chip . m4 

Note  that  the  regression  coefficient  estimate  on  Price  is  similar  to  that  from  the  OLS 
regression  fit  earlier,  but  the  standard  error  of  the  estimate  is  about  10%  lower  than  that 
from  the  simple  OLS  regression.  This  illustrates  the  general  result  that  the  simple  OLS 
estimator  is  consistent  but  the  associated  standard  error  is  generally  not  trustworthy. 


1 1 .4  Prewhitening  and  Stochastic  Regression 


271 


The  residuals  from  this  fitted  model  by  and  large  pass  various  model  diagnostic 
tests  except  that  the  residual  ACF  is  significant  at  lag  14.  As  a  result,  some  Box-Ljung 
test  statistics  have  /> values  bordering  on  0.05  when  14  or  more  lags  of  the  residual  auto¬ 
correlations  are  included  in  the  test.  Even  though  the  significant  ACF  at  lag  14  may  sug¬ 
gest  a  quarterly  effect,  we  do  not  report  a  more  complex  model  including  lag  14  because 
(1)  14  weeks  do  not  exactly  make  a  quarter  and  (2)  adding  a  seasonal  MA(1)  component 
of  period  14  only  results  in  marginal  improvement  in  terms  of  model  diagnostics. 

For  a  second  example,  we  study  the  impact  of  higher  gasoline  price  on  public  trans¬ 
portation  usage.  The  dataset  consists  of  the  monthly  number  of  boardings  on  public 
transportation  in  the  Denver,  Colorado,  region  together  with  the  average  monthly  gaso¬ 
line  prices  in  Denver  from  August  2000  through  March  2006.  Both  variables  are  skewed 
to  the  right  and  hence  are  log-transformed.  As  we  shall  see  below,  the  logarithmic  trans¬ 
formation  also  makes  the  final  fitted  model  more  interpretable.  The  time  series  plots, 
shown  in  Exhibit  1 1.24,  display  the  increasing  trends  for  both  variables  and  the  seasonal 
fluctuation  in  the  number  of  boardings.  Based  on  the  sample  ACF  and  PACF,  an 
ARIMA(2, 1 , 0)  model  was  fitted  to  the  gasoline  price  data.  This  fitted  model  was  then 
used  to  filter  the  boardings  data  before  computing  their  sample  CCF  which  is  shown  in 
Exhibit  11.25.  The  sample  CCF  is  significant  at  lags  0  and  15,  suggesting  positive  con¬ 
temporaneous  correlation  between  gasoline  price  and  public  transportation  usage.  The 
significant  CCF  at  lag  15,  however,  is  unlikely  to  be  real,  as  it  is  hard  to  imagine  why  the 
number  of  boardings  might  lead  the  gasoline  price  with  a  lag  of  15  months.  In  this  case, 
the  quick  preliminary  approach  of  prewhitening  the  series  by  fitting  a  long  AR  model, 
however,  showed  that  none  of  the  CCFs  are  significant.  It  turns  out  that  even  after  differ¬ 
encing  the  data,  the  AIC  selects  an  AR(16)  model.  The  higher  order  selected  coupled 
with  the  relatively  short  time  span  may  substantially  weaken  the  power  to  detect  correla¬ 
tions  between  the  two  variables.  Incidentally,  this  example  warns  against  simply  relying 
on  the  AIC  to  select  a  high-order  AR  model  to  do  prewhitening,  especially  with  rela¬ 
tively  short  time  series  data. 


Exhibit  11.24  Logarithms  of  Monthly  Public  Transit  Boardings  and 

Gasoline  Prices  in  Denver,  August  2000  through  March  2006 


i i  i  i  i  r 


2001 

2002  2003  2004 

2005 

2006 

Time 

>  data (boardings ) 

>  plot (boardings , yax 

.  f lip=T) 
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Exhibit  11.25  Sample  CCF  of  Prewhitened  Log(Boardings)  and  Log(Price) 


-1.0  -0.5  0.0  0.5  1.0 

Lag 


>  ml=arima (boardings [ , 2] , order=c (2,1,0) ) 

>  prewhiten (x=boardings [, 2] , y=boardings [ , 1] ,x.model=ml) 


Based  on  the  sample  ACF,  PACF,  and  EACF  of  the  residuals  from  a  linear  model  of 
boardings  on  gasoline  price,  a  seasonal  ARlMA(2,0,0)x(l,0,0)12  model  was  tentatively 
specified  for  the  error  process  in  the  regression  model.  However,  the  ^  coefficient  esti¬ 
mate  was  not  significant,  and  hence  the  AR  order  was  reduced  to  p  =  1.  Using  the  outlier 
detection  techniques  discussed  in  Section  1 1 .2,  we  found  an  additive  outlier  for  March 
2003  and  an  innovative  outlier  for  March  2004.  Because  the  test  statistic  for  the  additive 
outlier  had  a  larger  magnitude  than  that  of  the  innovative  outlier  (-4.09  vs.  3.65),  we 
incorporated  the  additive  outlier  in  the  model. '  Diagnostics  of  the  subsequent  fitted 
model  reveals  that  the  residual  ACF  was  significant  at  lag  3,  which  suggests  the  error 
process  is  a  seasonal  ARIMA(l,0,3)x(l,0,0)12  +  outlier  process.  As  the  estimates  of 
the  coefficients  9j  and  02  were  found  to  be  insignificant,  they  were  suppressed  from  the 
final  fitted  model  that  is  reported  in  Exhibit  1 1 .26. 

Diagnostics  of  the  final  fitted  model  suggest  a  good  fit  to  the  data.  Also,  no  further 
outliers  were  detected.  A  95%  confidence  interval  for  the  regression  coefficient  on 
Log(Price)  is  (0.0249,  0.139).  Note  the  interpretation  of  the  fitted  model:  a  100% 
increase  in  the  price  of  gasoline  will  lead  to  about  an  8.2%  increase  in  public  transporta¬ 
tion  usage. 


'  Subsequent  investigation  revealed  that  a  30  inch  snowstorm  in  March  2003  completely  shut 
down  Denver  for  one  full  day.  It  remained  partially  shut  down  for  a  few  more  days. 
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Exhibit  11.26  Maximum  Likelihood  Estimates  of  the  Regression  Model  of 
Log(Boardings)  on  Log(Price)  with  ARMA  Errors 


Parameter 

<t>i 

03 

O, 

Intercept 

Log(Price) 

Outlier 

Estimate 

0.8782 

0.3836 

0.8987 

12.12 

0.0819 

-0.0643 

Standard  Error 

0.0645 

0.1475 

0.0395 

0.1638 

0.0291 

0.0109 

ct2  estimated  as  0.0004094:  log-likelihood  =  158.02,  AIC  =  -304.05 

>  log . boardings=boardings [ , 1] 

>  log . price=boardings  [ , 2] 

>  boardings . ml=arima ( log . boardings , order=c (1,0,0) , 

seasonal = list ( order =c (1,0,0) , period=12 ) , 
xreg=data . frame ( log . price) ) 

>  boardings. ml 

>  detectAO (boardings . ml ) ;  detectIO (boardings . ml ) 

>  boardings . m2=arima ( log . boardings , order=c (1,0,3)  , 

seasonal = list ( order =c (1,0,0) , period=12 ) , 

xreg=data . frame ( log . price , outlier=c (rep (0,31) , 1 , rep (0,36))), 
fixed=c (NA, 0,0, rep (NA, 5) ) ) 

>  boardings. m2 

>  detectAO (boardings . m2 ) ;  detectIO (boardings . m2 ) 

>  tsdiag (boardings .m2,tol=.15,gof. lag=24 ) 


It  is  also  of  interest  to  note  that  dropping  the  outlier  term  from  the  model  results  in 
a  new  regression  estimate  on  Log(Price)  of  0.0619  with  a  standard  error  of  0.0372. 
Thus,  when  the  outlier  is  not  properly  modeled,  the  regression  coefficient  ceases  to  be 
significant  at  the  5%  level.  As  demonstrated  by  this  example,  the  presence  of  an  outlier 
can  adversely  affect  inference  in  time  series  modeling. 

11.5  Summary 


In  this  chapter,  we  used  information  from  other  events  or  other  time  series  to  help  model 
the  time  series  of  main  interest.  We  began  with  the  so-called  intervention  models,  which 
attempt  to  incorporate  known  external  events  that  we  believe  have  a  significant  effect  on 
the  time  series  of  interest.  Various  simple  but  useful  ways  of  modeling  the  effects  of 
these  interventions  were  discussed.  Outliers  are  observations  that  deviate  rather  substan¬ 
tially  from  the  general  pattern  of  the  data.  Models  were  developed  to  detect  and  incorpo¬ 
rate  outliers  in  time  series.  The  material  in  the  section  on  spurious  correlation  illustrates 
how  difficult  it  is  to  assess  relationships  between  two  time  series,  but  methods  involving 
prewhitening  were  shown  to  help  in  this  regard.  Several  substantial  examples  were  used 
to  illustrate  the  methods  and  techniques  discussed. 
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Exercises 


11.1  Produce  a  time  series  plot  of  the  air  passenger  miles  over  the  period  January  1996 
through  May  2005  using  seasonal  plotting  symbols.  Display  the  graph  full-screen 
and  discuss  the  seasonality  that  is  displayed.  The  data  are  in  the  file  named 
airmiles. 

11.2  Show  that  the  expression  given  for  mt  in  Equation  (11.1.7)  on  page  251  satisfies 
the  “AR(1)”  recursion  given  in  Equation  (1 1.1.6)  with  the  initial  condition  m0  =  0. 

11.3  Find  the  “half-life”  for  the  intervention  effect  specified  in  Equation  (11.1.6)  on 
page  251  when  8  =  0.7. 

11.4  Show  that  the  “half-life”  for  the  intervention  effect  specified  in  Equation  (1 1.1.6) 
on  page  25 1  increases  without  bound  as  5  increases  to  1 . 

11.5  Show  that  for  the  intervention  effect  specified  by  Equation  (1 1 . 1 .6)  on  page  25 1 


lim  m 
5  — >  1 


[o>(7-  t),  for  t  >  7 
[0,  otherwise 


11.6  Consider  the  intervention  effect  displayed  in  Exhibit  1 1.3,  (b),  page  253. 

(a)  Show  that  the  jump  at  time  7’+  I  is  of  height  co  as  displayed. 

(b)  Show  that,  as  displayed,  the  intervention  effect  tends  to  co/(l  -  5)  as  t 
increases  without  bound. 

11.7  Consider  the  intervention  effect  displayed  in  Exhibit  11.3,  (c),  page  253.  Show 
that  the  effect  increases  linearly  starting  at  time  7+1  with  slope  co  as  displayed. 

11.8  Consider  the  intervention  effect  displayed  in  Exhibit  1 1.4,  (a),  page  254. 

(a)  Show  that  the  jump  at  time  7  +  1  is  of  height  co  as  displayed. 

(b)  Show  that,  as  displayed,  the  intervention  effect  tends  to  go  back  to  0  as  t 
increases  without  bound. 

11.9  Consider  the  intervention  effect  displayed  in  Exhibit  1 1.4,  (b),  page  254. 

(a)  Show  that  the  jump  at  time  7  +  1  is  of  height  co  j  +  ff>2  as  displayed. 

(b)  Show  that,  as  displayed,  the  intervention  effect  tends  to  coo  as  t  increases  with¬ 
out  bound. 

11.10  Consider  the  intervention  effect  displayed  in  Exhibit  1 1.4,  (c),  page  254. 

(a)  Show  that  the  jump  at  time  7  is  of  height  coq  as  displayed. 

(a)  Show  that  the  jump  at  time  7  +  1  is  of  height  co  j  +  ff>2  as  displayed. 

(b)  Show  that,  as  displayed,  the  intervention  effect  tends  to  cbt  as  t  increases  with¬ 
out  bound. 

11.11  Simulate  100  pairs  of  ( XpYt )  from  the  model  of  Equation  (11.3.1)  on  page  261 
with  <7=3,  P0  =  0,  and  Pi  =  1.  Use  ax  =  2  and  ae  =  1.  Display  and  interpret  the 
sample  CCF  between  these  two  series. 

11.12  Show  that  when  the  X  and  Y  are  independent  AR(1)  time  series  with  parameters 
4>X  and  c|> y,  respectively,  Equation  (11.3.5)  on  page  262  reduces  to  give  Equation 
(11.3.6). 

11.13  Show  that  for  the  process  defined  by  Equation  (11.4.5)  on  page  266,  the 
cross-correlation  between  A  and  Y  at  lag  A:  is  given  by  P  ,(ct-/ct-). 
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11.14  Simulate  an  AR  time  series  with  (J>  =  0.7,  |i  =  0,  ae=  1,  and  of  length  n  =  48.  Plot 
the  time  series,  and  inspect  the  sample  ACF  and  PACF  of  the  series. 

(a)  Now  add  a  step  function  response  of  co  =  1  unit  height  at  time  r  =  36  to  the 
simulated  series.  The  series  now  has  a  theoretical  mean  of  zero  from  t  =  1  to 
35  and  a  mean  of  1  from  t  =  36  on.  Plot  the  new  time  series  and  calculate  the 
sample  ACF  and  PACF  for  the  new  series.  Compare  these  with  the  results  for 
the  original  series. 

(b)  Repeat  part  (a)  but  with  an  impulse  response  at  time  t  =  36  of  unit  height,  c o  = 
1.  Plot  the  new  time  series,  and  calculate  the  sample  ACF  and  PACF  for  the 
new  series.  Compare  these  with  the  results  for  the  original  series.  See  if  you 
can  detect  the  additive  outlier  at  time  t  =  36  assuming  that  you  do  not  know 
where  the  outlier  might  occur. 

11.15  Consider  the  air  passenger  miles  time  series  discussed  in  this  chapter.  The  file  is 
named  airmiles.  Use  only  the  preintervention  data  (that  is,  data  prior  to  September 
2001)  for  this  exercise. 

(a)  Verify  that  the  sample  ACF  for  the  twice  differenced  series  of  the  logarithms 
of  the  preintervention  data  is  as  shown  in  Exhibit  1 1.5  on  page  255. 

(b)  The  plot  created  in  part  (a)  suggests  an  ARIMA(0,l,l)x(0,l,0)i2.  Fit  this 
model  and  assess  its  adequacy.  In  particular,  verify  that  additive  outliers  are 
detected  in  December  1996,  January  1997,  and  December  2002. 

(c)  Now  fit  an  ARIMA(0,l,l)x(0,l,0)j2  +  three  outliers  model  and  assess  its  ade¬ 
quacy. 

(d)  Finally,  fit  an  ARIMA(0,l,l)x(0,l,l)]2  +  three  outliers  model  and  assess  its 
adequacy. 

11.16  Use  the  logarithms  of  the  Denver  region  public  transportation  boardings  and  Den¬ 
ver  gasoline  price  series.  The  data  are  in  the  file  named  boardings. 

(a)  Display  the  time  series  plot  of  the  monthly  boardings  using  seasonal  plotting 
symbols.  Interpret  the  plot. 

(b)  Display  the  time  series  plot  of  the  monthly  average  gasoline  prices  using  sea¬ 
sonal  plotting  symbols.  Interpret  the  plot. 

11.17  The  data  file  named  deerel  contains  82  consecutive  values  for  the  amount  of 
deviation  (in  0.000025  inch  units)  from  a  specified  target  value  that  an  industrial 
machining  process  at  Deere  &  Co.  produced  under  certain  specified  operating 
conditions.  These  data  were  first  used  in  Exercise  6.33,  page  146,  where  we 
observed  an  obvious  outlier  at  time  t  =  27. 

(a)  Fit  an  AR(2)  model  using  the  original  data  including  the  outlier. 

(b)  Test  the  fitted  AR(2)  model  of  part  (a)  for  both  AO  and  IO  outliers. 

(c)  Now  fit  the  AR(2)  model  incorporating  a  term  in  the  model  for  the  outlier. 

(d)  Assess  the  fit  of  the  model  in  part  (c)  using  all  of  our  diagnostic  tools.  In  par¬ 
ticular,  compare  the  properties  of  this  model  with  the  one  obtained  in  part  (a). 


276 


Time  Series  Regression  Models 


11.18  The  data  file  named  days  contains  accounting  data  from  the  Winegard  Co.  of  Bur¬ 
lington,  Iowa.  The  data  are  the  number  of  days  until  Winegard  receives  payment 
for  130  consecutive  orders  from  a  particular  distributor  of  Winegard  products. 
(The  name  of  the  distributor  must  remain  anonymous  for  confidentiality  reasons.) 
These  data  were  first  investigated  in  Exercise  6.39,  page  147,  but  several  outliers 
were  observed.  When  the  observed  outliers  were  replaced  by  more  typical  values, 
an  MA(2)  model  was  suggested. 

(a)  Fit  an  MA(2)  model  to  the  original  data,  and  test  the  fitted  model  for  both  AO 
and  IO  outliers. 

(b)  Now  fit  the  MA(2)  model  incorporating  the  outliers  into  the  model. 

(c)  Assess  the  fit  of  the  model  obtained  in  part  (b).  In  particular,  are  any  more  out¬ 
liers  indicated? 

(d)  Fit  another  MA(2)  model  incorporating  any  additional  outliers  found  in  part 
(c),  and  assess  the  fit  of  this  model. 

11.19  The  data  file  named  bluebirdlite  contains  weekly  sales  and  price  data  for  Bluebird 
Lite  potato  chips.  Carry  out  an  analysis  similar  to  that  for  Bluebird  Standard 
potato  chips  that  was  begun  on  page  267. 

11.20  The  file  named  units  contains  annual  unit  sales  of  a  certain  product  from  a  widely 
known  international  company  over  the  years  1983  through  2005.  (The  name  of 
the  company  must  remain  anonymous  for  proprietary  reasons.) 

(a)  Plot  the  time  series  of  units  and  describe  the  general  features  of  the  plot. 

(b)  Use  ordinary  least  squares  regression  to  fit  a  straight  line  in  time  to  the  series. 

(c)  Display  the  sample  PACF  of  the  residuals  from  this  model,  and  specify  an 
ARIMA  model  for  the  residuals. 

(d)  Now  fit  the  model  unit  sales  =  AR(2)  +  time.  Interpret  the  output.  In  particu¬ 
lar,  compare  the  estimated  regression  coefficient  on  the  time  variable  obtained 
here  with  the  one  you  obtained  in  part  (b). 

(e)  Perform  a  thorough  analysis  of  the  residuals  from  this  last  model. 

(f)  Repeat  parts  (d)  and  (e)  using  the  logarithms  of  unit  sales  as  the  response  vari¬ 
able.  Compare  these  results  witjh  those  obtained  in  parts  (d)  and  (e). 

11.21  In  Chapters  5-8,  we  investigated  an  IMA(l.l)  model  for  the  logarithms  of 
monthly  oil  prices.  Exhibit  8.3  on  page  178  suggested  that  there  may  be  several 
outliers  in  this  series.  Investigate  the  IMA(1,1)  model  for  this  series  for  outliers 
using  the  techniques  developed  in  this  chapter.  Be  sure  to  compare  your  results 
with  those  obtained  earlier  that  ignored  the  outliers.  The  data  are  in  the  file  named 
oil. 


Chapter  12 

Time  Series  Models  of 
Heteroscedasticity 


The  models  discussed  so  far  concern  the  conditional  mean  structure  of  time  series  data. 
However,  more  recently,  there  has  been  much  work  on  modeling  the  conditional  vari¬ 
ance  structure  of  time  series  data — mainly  motivated  by  the  needs  for  financial  model¬ 
ing.  Let  { Yt}  be  a  time  series  of  interest.  The  conditional  variance  of  Yt  given  the  past  Y 
values,  Yt_  | ,  K,  _  2,  ■  ■  ■ ,  measures  the  uncertainty  in  the  deviation  of  Y,  from  its  condi¬ 
tional  mean  E(Yt\Yt_  \,Yt_  2,...).  If  {Yr}  follows  some  ARIMA  model,  the  (one-step- 
ahead)  conditional  variance  is  always  equal  to  the  noise  variance  for  any  present  and 
past  values  of  the  process.  Indeed,  the  constancy  of  the  conditional  variance  is  true  for 
predictions  of  any  fixed  number  of  steps  ahead  for  an  ARIMA  process.  In  practice,  the 
(one-step-ahead)  conditional  variance  may  vary  with  the  current  and  past  values  of  the 
process,  and,  as  such,  the  conditional  variance  is  itself  a  random  process,  often  referred 
to  as  the  conditional  variance  process.  For  example,  daily  returns  of  stocks  are  often 
observed  to  have  larger  conditional  variance  following  a  period  of  violent  price  move¬ 
ment  than  a  relatively  stable  period.  The  development  of  models  for  the  conditional 
variance  process  with  which  we  can  predict  the  variability  of  future  values  based  on  cur¬ 
rent  and  past  data  is  the  main  concern  of  the  present  chapter.  In  contrast,  the  ARIMA 
models  studied  in  earlier  chapters  focus  on  how  to  predict  the  conditional  mean  of  future 
values  based  on  current  and  past  data. 

In  finance,  the  conditional  variance  of  the  return  of  a  financial  asset  is  often  adopted 
as  a  measure  of  the  risk  of  the  asset.  This  is  a  key  component  in  the  mathematical  theory 
of  pricing  a  financial  asset  and  the  VaR  (Value  at  Risk)  calculations;  see,  for  example, 
Tsay  (2005).  In  an  efficient  market,  the  expected  return  (conditional  mean)  should  be 
zero,  and  hence  the  return  series  should  be  white  noise.  Such  series  have  the  simplest 
autocorrelation  structure.  Thus,  for  ease  of  exposition,  we  shall  assume  in  the  first  few 
sections  of  this  chapter  that  the  data  are  returns  of  some  financial  asset  and  are  white 
noise;  that  is,  serially  uncorrelated  data.  By  doing  so,  we  can  concentrate  initially  on 
studying  how  to  model  the  conditional  variance  structure  of  a  time  series.  By  the  end  of 
the  chapter,  we  discuss  some  simple  schemes  for  simultaneously  modeling  the  condi¬ 
tional  mean  and  conditional  variance  structure  by  combining  an  ARIMA  model  with  a 
model  of  conditional  heteroscedasticity. 
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12.1  Some  Common  Features  of  Financial  Time  Series 


As  an  example  of  financial  time  series,  we  consider  the  daily  values  of  a  unit  of  the 
CREF  stock  fund  over  the  period  from  August  26,  2004  to  August  15,  2006.  The  CREF 
stock  fund  is  a  fund  of  several  thousand  stocks  and  is  not  openly  traded  in  the  stock  mar¬ 
ket.  '  Since  stocks  are  not  traded  over  weekends  or  on  holidays,  only  on  so-called  trad¬ 
ing  days,  the  CREF  data  do  not  change  over  weekends  and  holidays.  For  simplicity,  we 
will  analyze  the  data  as  if  they  were  equally  spaced.  Exhibit  12.1  shows  the  time  series 
plot  of  the  CREF  data.  It  shows  a  generally  increasing  trend  with  a  hint  of  higher  vari¬ 
ability  with  higher  level  of  the  stock  value.  Let  {pt}  be  the  time  series  of,  say,  the  daily 
price  of  some  financial  asset.  The  (continuously  compounded)  return  on  the  rth  day  is 
defined  as 

rt  =  logO^-logO^j)  (12.1.1) 

Sometimes  the  returns  are  then  multiplied  by  100  so  that  they  can  be  interpreted  as  per¬ 
centage  changes  in  the  price.  The  multiplication  may  also  reduce  numerical  errors  as  the 
raw  returns  could  be  very  small  numbers  and  render  large  rounding  errors  in  some  cal¬ 
culations. 


Exhibit  12.1  Daily  CREF  Stock  Values:  August  26,  2004  to  August  15, 
2006 


>  win . graph (width=4 . 875 , height =2 . 5 , pointsize=8 ) 

>  data (CREF);  plot (CREF) 


Exhibit  12.2  plots  the  CREF  return  series  (sample  size  =  500).  The  plot  shows  that 
the  returns  were  more  volatile  over  some  time  periods  and  became  very  volatile  toward 
the  end  of  the  study  period.  This  observation  may  be  more  clearly  seen  by  plotting  the 
time  sequence  plot  of  the  absolute  or  squared  returns;  see  Exercise  12.1,  page  316. 


'  CREF  stands  for  College  Retirement  Equities  Fund — a  group  of  stock  and  bond  funds  cru¬ 
cial  to  many  college  faculty  retirement  plans. 
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These  results  might  be  triggered  by  the  instability  in  the  Middle  East  due  to  a  war  in 
southern  Lebanon  from  July  12  to  August  14,  2006,  the  period  that  is  shaded  in  gray  in 
Exhibits  12.1  and  12.2.  This  pattern  of  alternating  quiet  and  volatile  periods  of  substan¬ 
tial  duration  is  referred  to  as  volatility  clustering  in  the  literature.  Volatility  in  a  time 
series  refers  to  the  phenomenon  where  the  conditional  variance  of  the  time  series  varies 
over  time.  The  study  of  the  dynamical  pattern  in  the  volatility  of  a  time  series  (that  is, 
the  conditional  variance  process  of  the  time  series)  constitutes  the  main  subject  of  this 
chapter. 


Exhibit  12.2  Daily  CREF  Stock  Returns:  August  26,  2004  to  August  15, 
2006 


Time 


>  r . cref=diff ( log (CREF) )* 100 

>  plot (r . cref ) ;  abline(h=0) 


The  sample  ACF  and  PACF  of  the  daily  CREF  returns  (multiplied  by  100),  shown 
in  Exhibits  12.3  and  12.4,  suggest  that  the  returns  have  little  serial  correlation  at  all.  The 
sample  EACF  (not  shown)  also  suggests  that  a  white  noise  model  is  appropriate  for 
these  data.  The  average  CREF  return  equals  0.0493  with  a  standard  error  of  0.02885. 
Thus  the  mean  of  the  return  process  is  not  statistically  significantly  different  from  zero. 
This  is  expected  based  on  the  efficient-market  hypothesis  alluded  to  in  the  introduction 
to  this  chapter. 
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Exhibit  12.3  Sample  ACF  of  Daily  CREF  Returns:  8/26/04  to  8/15/06 


0  5  10  15  20  25 


Lag 


>  acf (r . cref ) 


Exhibit  12.4  Sample  PACF  of  Daily  CREF  Returns:  8/26/04  to  8/15/06 


0  5  10  15  20  25 


Lag 


>  pacf (r . cref ) 


However,  the  volatility  clustering  observed  in  the  CREF  return  data  gives  us  a  hint 
that  they  may  not  be  independently  and  identically  distributed — otherwise  the  variance 
would  be  constant  over  time.  This  is  the  first  occasion  in  our  study  of  time  series  models 
where  we  need  to  distinguish  between  series  values  being  uncorrelated  and  series  values 
being  independent.  If  series  values  are  truly  independent,  then  nonlinear  instantaneous 
transformations  such  as  taking  logarithms,  absolute  values,  or  squaring  preserves  inde¬ 
pendence.  However,  the  same  is  not  true  of  correlation,  as  correlation  is  only  a  measure 
of  linear  dependence.  Higher-order  serial  dependence  structure  in  data  can  be  explored 
by  studying  the  autocorrelation  structure  of  the  absolute  returns  (of  lesser  sampling  vari- 
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ability  with  less  mathematical  tractability)  or  that  of  the  squared  returns  (of  greater  sam¬ 
pling  variability  but  with  more  manageability  in  terms  of  statistical  theory).  If  the 
returns  are  independently  and  identically  distributed,  then  so  are  the  absolute  returns  (as 
are  the  squared  returns),  and  hence  they  will  be  white  noise  as  well.  Hence,  if  the  abso¬ 
lute  or  squared  returns  admit  some  significant  autocorrelations,  then  these  autocorrela¬ 
tions  furnish  some  evidence  against  the  hypothesis  that  the  returns  are  independently 
and  identically  distributed.  Indeed,  the  sample  ACF  and  PACF  of  the  absolute  returns 
and  those  of  the  squared  returns  in  Exhibits  12.5  through  12.8  display  some  significant 
autocorrelations  and  hence  provide  some  evidence  that  the  daily  CREF  returns  are  not 
independently  and  identically  distributed. 


Exhibit  12.5  Sample  ACF  of  the  Absolute  Daily  CREF  Returns 

ID 


Lag 

>  acf (abs (r.cref ) ) 


Exhibit  12.6  Sample  PACF  of  the  Absolute  Daily  CREF  Returns 


Lag 

>  pacf (abs (r . cref )  ) 
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Exhibit  12.7  Sample  ACF  of  the  Squared  Daily  CREF  Returns 


Lag 

>  acf (r . cref A2 ) 


Exhibit  12.8  Sample  PACF  of  the  Squared  Daily  CREF  Returns 


>  pacf (r . cref ) 


These  visual  tools  are  often  supplemented  by  formally  testing  whether  the  squared 
data  are  autocorrelated  using  the  Box-Ljung  test.  Because  no  model  fitting  is  required, 
the  degrees  of  freedom  of  the  approximating  chi-square  distribution  for  the  Box-Ljung 
statistic  equals  the  number  of  correlations  used  in  the  test.  Hence,  if  we  use  m  autocorre¬ 
lations  of  the  squared  data  in  the  test,  the  test  statistic  is  approximately  chi-square  dis¬ 
tributed  with  m  degrees  of  freedom,  if  there  is  no  ARCH.  This  approach  can  be  extended 
to  the  case  when  the  conditional  mean  of  the  process  is  non-zero  and  if  an  ARMA 
model  is  adequate  in  describing  the  autocorrelation  structure  of  the  data.  In  which  case, 
the  first  m  autocorrelations  of  the  squared  residuals  from  this  model  can  be  used  to  test 
for  the  presence  of  ARCH.  The  corresponding  Box-Ljung  statistic  will  have  a 
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chi-square  distribution  with  m  degrees  of  freedom  under  the  assumption  of  no  ARCH 
effect,  see  McLeod  and  Li  (1983)  and  Li(2004).  Below,  we  shall  refer  to  the  test  for 
ARCH  effects  using  the  Box-Ljung  statistic  with  the  squared  residuals  or  data  as  the 
McLeod-  Li  test. 

In  practice,  it  is  useful  to  apply  the  McLeod-Li  test  for  ARCH  using  a  number  of 
lags  and  plot  the  /> values  of  the  test.  Exhibit  12.9  shows  that  the  McLeod-Li  tests  are  all 
significant  at  the  5%  significance  level  when  more  than  3  lags  are  included  in  the  test. 
This  is  broadly  consistent  with  the  visual  pattern  in  Exhibit  12.7  and  formally  shows 
strong  evidence  for  ARCH  in  this  data. 


Exhibit  12.9  McLeod-Li  Test  Statistics  for  Daily  CREF  Returns 


00 

O 


CD 

O 


■'d- 

o 


CM 

o 


o 

o 


Lag 


>  win . graph (width=4 . 875 ,  height=3 , pointsize=8 ) 

>  McLeod . Li . test (y=r . cref) 


The  distributional  shape  of  the  CREF  returns  can  be  explored  by  constructing  a  QQ 
normal  scores  plot — see  Exhibit  12.10.  The  QQ  plot  suggests  that  the  distribution  of 
returns  may  have  a  tail  thicker  than  that  of  a  normal  distribution  and  may  be  somewhat 
skewed  to  the  right.  Indeed,  the  Shapiro- Wilk  test  statistic  for  testing  normality  equals 
0.9932  with  p-value  equal  to  0.024,  and  hence  we  reject  the  normality  hypothesis  at  the 
usual  significance  levels. 
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Exhibit  12.10  QQ  Normal  Plot  of  Daily  CREF  Returns 


Theoretical  Quantiles 


>  win . graph (width=2 . 5 , height =2 . 5 , pointsize=8 ) 

>  qqnorm (r . cref ) ;  qqline (r . cref ) 

The  skewness  of  a  random  variable,  say  K  is  defined  by  Zs(T-|i)  /a  ,  where  p  and  a 
are  the  mean  and  standard  deviation  of  Y,  respectively.  It  can  be  estimated  by  the  sample 
skewness 


Si=  X  (Yi-Y)3/(na)  (12.1.2) 

i  =  1 

where  cf  =  Z(  Y-  -  Y)~ / n  is  the  sample  variance.  The  sample  skewness  of  the  CREF 
returns  equals  0.116.  The  thickness  of  the  tail  of  a  distribution  relative  to  that  of  a  nor¬ 
mal  distribution  is  often  measured  by  the  (excess)  kurtosis,  defined  as  E(Y -  |i)4/cr4  -  3. 
For  normal  distributions,  the  kurtosis  is  always  equal  to  zero.  A  distribution  with  posi¬ 
tive  kurtosis  is  called  a  heavy-tailed  distribution,  whereas  it  is  called  light-tailed  if  its 
kurtosis  is  negative.  The  kurtosis  can  be  estimated  by  the  sample  kurtosis 

82  =  fj(Yi-Y)A/{no)-2,  (12.1.3) 

1=1 


The  sample  kurtosis  of  the  CREF  returns  equals  0.6274.  An  alternative  definition  of 
kurtosis  modifies  the  formula  and  uses  E{rt  -  p)4/a4;  that  is,  it  does  not  subtract  three 
from  the  ratio.  We  shall  always  use  the  former  definition  for  kurtosis. 

Another  test  for  normality  is  the  Jarque-Bera  test,  which  is  based  on  the  fact  that  a 
normal  distribution  has  zero  skewness  and  zero  kurtosis.  Assuming  independently  and 
identically  distributed  data  Yl,Y2,...,Yn,  the  Jarque-Bera  test  statistic  is  defined  as 


JB 


2  2 

ng 1  »g2 

6  +  24 


(12.1.4) 
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where  gi  is  the  sample  skewness  and  g2  is  the  sample  kurtosis.  Under  the  null  hypothe¬ 
sis  of  normality,  the  Jarque-Bera  test  statistic  is  approximately  distributed  as  '/}  with 
two  degrees  of  freedom.  In  fact,  under  the  normality  assumption,  each  summand  defin- 
ing  the  Jarque-Bera  statistic  is  approximately  with  1  degree  of  freedom.  The 
Jarque-Bera  test  rejects  the  normality  assumption  if  the  test  statistic  is  too  large.  For  the 
CREF  returns,  JB  =  500x0.1 162/6  +  500x0.62742/24  =  1.12  +  8.20  =  9.32  with  a 
p-value  equal  to  0.01 1.  Recall  that  the  upper  5  percentage  point  of  a  7“  distribution  with 
unit  degree  of  freedom  equals  3.84.  Hence,  the  data  appear  not  to  be  skewed  but  do  have 
a  relatively  heavy  tail.  In  particular,  the  normality  assumption  is  inconsistent  with  the 
CREF  return  data — a  conclusion  that  is  also  consistent  with  the  finding  of  the  Sha¬ 
piro- Wilk  test. 

In  summary,  the  CREF  return  data  are  found  to  be  serially  uncorrelated  but  admit  a 
higher-order  dependence  structure,  namely  volatility  clustering,  and  a  heavy-tailed  dis¬ 
tribution.  It  is  commonly  observed  that  such  characteristics  are  rather  prevalent  among 
financial  time  series  data.  The  GARCH  models  introduced  in  the  next  sections  attempt 
to  provide  a  framework  for  modeling  and  analyzing  time  series  that  display  some  of 
these  characteristics. 

12.2  The  ARCH(1)  Model 


Engle  (1982)  first  proposed  the  autoregressive  conditional  heteroscedasticity  (ARCH) 
model  for  modeling  the  changing  variance  of  a  time  series.  As  discussed  in  the  previous 
section,  the  return  series  of  a  financial  asset,  say  {rt},  is  often  a  serially  uncorrelated 
sequence  with  zero  mean,  even  as  it  exhibits  volatility  clustering.  This  suggests  that  the 
conditional  variance  of  rt  given  past  returns  is  not  constant.  The  conditional  variance, 
also  referred  to  as  the  conditional  volatility,  of  rt  will  be  denoted  by  o2|r_  with  the 
subscript  t  -  1  signifying  that  the  conditioning  is  upon  returns  through  time  t—  1 .  When 
rt  is  available,  the  squared  return  r(  provides  an  unbiased  estimator  of  ar|f  _j  .  A  series 
of  large  squared  returns  may  foretell  a  relatively  volatile  period.  Conversely,  a  series  of 
small  squared  returns  may  foretell  a  relatively  quiet  period.  The  ARCH  model  is  for¬ 
mally  a  regression  model  with  the  conditional  volatility  as  the  response  variable  and  the 
past  lags  of  the  squared  return  as  the  covariates.  For  example,  the  ARCH(l)  model 
assumes  that  the  return  series  {rt}  is  generated  as  follows: 

rt  =  at\t- lsf  t12-2-1) 

c?|f-l  =  ®  +  a  G-t  (12.2.2) 

where  a  and  co  are  unknown  parameters,  {sf}  is  a  sequence  of  independently  and  identi¬ 
cally  distributed  random  variables  each  with  zero  mean  and  unit  variance  (also  known 
as  the  innovations ),  and  st  is  independent  of  rt  -  j,j  =  1 ,2,...  .  The  innovation  et  is  pre¬ 
sumed  to  have  unit  variance  so  that  the  conditional  variance  of  rt  equals  cr2|f  _  j.  This 
follows  from 
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K-jJ  =  •••)  =  i£fl 


r,_j,j  =  1,  2,  ...) 
rt-fj= 


2 

_  CTf|t-l 


(12.2.3) 


The  second  equality  follows  because  af|f  _  \  is  known  given  the  past  returns,  the  third 
equality  holds  because  sf  is  independent  of  past  returns,  and  the  last  equality  results 
from  the  assumption  that  the  variance  of  et  equals  1 . 

Exhibit  12.11  shows  the  time  series  plot  of  a  simulated  series  of  size  500  from  an 
ARCH(l)  model  with  co  =  0.01  and  a  =  0.9.  Volatility  clustering  is  evident  in  the  data  as 
larger  fluctuations  cluster  together,  although  the  series  is  able  to  recover  from  large  fluc¬ 
tuations  quickly  because  of  the  very  short  memory  in  the  conditional  variance  process.^ 


Exhibit  12.11  Simulated  ARCH(1)  Model  with  co  =  0.01  and  a1  =  0.9 
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>  set . seed (1235678 ) ;  library ( tseries ) 

>  garchOl . sim=garch . sim (alpha=c (.01, .9) ,n=500) 

>  plot (garchOl . sim, type= ’ 1 ’, ylab=expression (r [t]  )  ,  xlab= 1 t 1 ) 


While  the  ARCH  model  resembles  a  regression  model,  the  fact  that  the  conditional 
variance  is  not  directly  observable  (and  hence  is  called  a  latent  variable)  introduces 
some  subtlety  in  the  use  of  ARCH  models  in  data  analysis.  For  example,  it  is  not  obvi¬ 
ous  how  to  explore  the  regression  relationship  graphically.  To  do  so,  it  is  pertinent  to 
replace  the  conditional  variance  by  some  observable  in  Equation  (12.2.2).  Let 


1  The  R  package  named  tseries  is  reqired  for  this  chapter.  We  assume  that  the  reader  has 
downloaded  and  installed  it. 
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It  can  be  verified  that  {r|,}  is  a  serially  uncorrelated  series  with  zero  mean.  Moreover,  r\t 
is  uncorrelated  with  past  returns.  Substituting  a?f  _  j  =  -  r\t  into  Equation  (12.2.2) 

it  is  obvious  that 

2  2 

rf  =  co  +  art_j+rir  (12.2.5) 

Thus,  the  squared  return  series  satisfies  an  AR(1)  model  under  the  assumption  of  an 
ARCH(l)  model  for  the  return  series!  Based  on  this  useful  observation,  an  ARCH(l) 
model  may  be  specified  if  an  AR(1)  specification  for  the  squared  returns  is  warranted  by 
techniques  learned  from  earlier  chapters. 

Besides  its  value  in  terms  of  data  analysis,  the  deduced  AR(1)  model  for  the 
squared  returns  can  be  exploited  to  gain  theoretical  insights  on  the  parameterization  of 
the  ARCH  model.  For  example,  because  the  squared  returns  must  be  nonnegative,  it 
makes  sense  to  always  restrict  the  parameters  to  and  a  to  be  nonnegative.  Also,  if  the 
return  series  is  stationary  with  variance  a  ,  then  taking  expectation  on  both  sides  of 
Equation  (12.2.5)  yields 

2  1 

ct  =  o)  +  act  (12.2.6) 

2 

That  is,  a~  =  co / ( 1  -  a)  and  hence  0  <  a  <  1.  Indeed,  it  can  be  shown  (Ling  and 
McAleer,  2002)  that  the  condition  0  <  a  <  1  is  necessary  and  sufficient  for  the  (weak) 
stationarity  of  the  ARCH(l)  model.  At  first  sight,  it  seems  that  the  concepts  of  stationar- 
ity  and  conditional  heteroscedasticity  may  be  incompatible.  However,  recall  that  weak 
stationarity  of  a  process  requires  that  the  mean  of  the  process  be  constant  and  the  covari¬ 
ance  of  the  process  at  any  two  epochs  be  finite  and  identical  whenever  the  lags  of  the 
two  epochs  are  the  same.  In  particular,  the  variance  is  constant  for  a  weakly  stationary 
process.  The  condition  0  <  a  <  1  implies  that  there  exists  an  initial  distribution  for  r0 
such  that  rt  defined  by  Equations  (12.2.1)  and  (12.2.2)  for  t  >  1  is  weakly  stationary  in 
the  sense  above.  It  is  interesting  to  observe  that  weak  stationarity  does  not  preclude  the 
possibility  of  a  nonconstant  conditional  variance  process,  as  is  the  case  for  the  ARCH(l) 
model!  It  can  be  checked  that  the  ARCH(l)  process  is  white  noise.  Hence,  it  is  an  exam¬ 
ple  of  a  white  noise  that  admits  a  nonconstant  conditional  variance  process  as  defined 
by  Equation  (12.2.2)  that  varies  with  the  lag  one  of  the  squared  process. 

A  satisfying  feature  of  the  ARCH(l)  model  is  that,  even  if  the  innovation  r\t  has  a 
normal  distribution,  the  stationary  distribution  of  an  ARCH(l)  model  with  1  >  a  >  0  has 
fat  tails;  that  is,  its  kurtosis,  E(  r^)/a4  -  3,  is  greater  than  zero.  (Recall  that  the  kurtosis 
of  a  normal  distribution  is  always  equal  to  0,  and  a  distribution  with  positive  kurtosis  is 
said  to  be  fat-tailed,  while  one  with  a  negative  kurtosis  is  called  a  light-tailed  distribu¬ 
tion.)  To  see  the  validity  of  this  claim,  consider  the  case  where  the  {sf}  are  indepen¬ 
dently  and  identically  distributed  as  standard  normal  variables.  Raising  both  sides  of 
Equation  (12.2.1)  on  page  285  to  the  fourth  power  and  taking  expectations  gives 
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E(^)  =  E[E(aAtV_  jsJ|r| 

=  E[a4t{t_lE(sA\rl 
=  E[aAlt_lE(  sf)] 

=  3£(cJf- l) 

The  first  equality  follows  from  the  iterated-expectation  formula,  which,  in  the  simple 
case  of  two  random  variables  X,  Y,  states  that  E[E(X\Y)\  =  E(X).  [See  Equation  (9.E.5)  on 
page  218  for  a  review.]  The  second  equality  results  from  the  fact  that  g,|,  _  |  is  known 
given  past  returns.  The  third  equality  is  a  result  of  the  independence  between  and  past 
returns,  and  the  final  equality  follows  from  the  normality  assumption.  It  remains  to  cal¬ 
culate  E(ct1  j)  .  Now,  it  is  unclear  whether  the  preceding  expectation  exists  as  a  finite 
number.  For  the  moment,  assume  it  does  and,  assuming  stationarity,  let  it  be  denoted  by 
x.  Below,  we  shall  derive  a  condition  for  this  assumption  to  be  valid.  Raising  both  sides 
of  Equation  (12.2.2)  to  the  second  power  and  taking  expectation  yields 

x  =  co2  +  2a>acT  +  a-3x  (12.2.8) 


,../=  1.2.3....)] 


,j=  1,2,  3,...)] 


(12.2.7) 


which  implies 


T  = 


to”  +  2a>aa" 


1  -3a 


(12.2.9) 


This  equality  shows  that  a  necessary  (and,  in  fact,  also  sufficient)  condition  for  the 
finiteness  of  i  is  that  0  <  a  <  1/ J3,  in  which  case  the  ARCH(l)  process  has  finite 
fourth  moment.  Incidentally,  this  shows  that  a  stationary  ARCH(l)  model  need  not  have 
finite  fourth  moments.  The  existence  of  finite  higher  moments  will  further  restrict  the 
parameter  range — a  feature  also  shared  by  higher-order  analogues  of  the  ARCH  model 
and  its  variants.  Returning  to  the  calculation  of  the  kurtosis  of  an  ARCH(l)  process,  it 
can  be  shown  by  tedious  algebra  that  Equation  (12.2.1)  implies  that  t  >  a4  and  hence 
E(r^)  >  3ct4  .  Thus  the  kurtosis  of  a  stationary  ARCH(  1)  process  is  greater  than  zero. 
This  verifies  our  earlier  statement  that  an  ARCH(l)  process  has  fat  tails  even  with  nor¬ 
mal  innovations.  In  other  words,  the  fat  tail  is  a  result  of  the  volatility  clustering  as  spec¬ 
ified  by  Equation  (12.2.2). 

A  main  use  of  the  ARCH  model  is  to  predict  the  future  conditional  variances.  For 
example,  one  might  be  interested  in  forecasting  the  /7-step-ahead  conditional  variance 

o;+h\,  =  E(rlh\rt,rt_l,...)  (12.2.10) 

For  h  =  1,  the  ARCH(l)  model  implies  that 

2  2  2  1 

CTf+l|f  =  ®  +  art  =  (l-a)CT  +a  rf  (12.2.11) 

which  is  a  weighted  average  of  the  long-run  variance  and  the  current  squared  return. 
Similarly,  using  the  iterated  expectation  formula,  we  have 
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^+h\t  =  E(r2t+hKrt- v  •■•) 

=  E[E(a;+hlt  +  h_ls^+h\rt  +  h_l,rt  +  h_2,...)\rt,r^l,  ...] 

=  £[af+Ak  +  /j_1£(Ef+/i)|rr,r,_1,  ...] 

=  E(°2+h\t+h-iKrt-v  •■•) 

=  co  +  a£(^+/l_,|rrr(  ,, ...) 


(12.2.12) 


-  co  +  a 

2  1 

where  we  adopt  the  convention  that  af  +  hit  =  rf  +  h  for  h  <  0.  The  formula  above  pro¬ 
vides  a  recursive  recipe  for  computing  the  /7-step-ahead  conditional  variance. 
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The  forecasting  formulas  derived  in  the  previous  section  show  both  the  strengths  and 
weaknesses  of  an  ARCH(l)  model,  as  the  forecasting  of  the  future  conditional  variances 
only  involves  the  most  recent  squared  return.  In  practice,  one  may  expect  that  the  accu¬ 
racy  of  forecasting  may  improve  by  including  all  past  squared  returns  with  lesser  weight 
for  more  distant  volatilities.  One  approach  is  to  include  further  lagged  squared  returns  in 
the  model.  The  ARCH/g)  model,  proposed  by  Engle  (1982),  generalizes  Equation 
(12.2.2)  on  page  285,  by  specifying  that 

a2t\t_l  =  ©  +  a1rj_1+a2rf_2+---  +  a9rf_9  (12.3.1) 

Here,  q  is  referred  to  as  the  ARCH  order.  Another  approach,  proposed  by  Bollerslev 
( 1986)  and  Taylor  (1986),  introduces  p  lags  of  the  conditional  variance  in  the  model, 
where  p  is  referred  to  as  the  GARCH  order.  The  combined  model  is  called  the  general¬ 
ized  autoregressive  conditional  heteroscedasticity,  GARCH(p,g),  model. 

afif-t  = 03 + p 1 1?  2 + ■" +P/7af-pi7-p-i +aiG-i 

+  a  2*7-2 +  -+<y>_9 

In  terms  of  the  backshift  B  notation,  the  model  can  be  expressed  as 

(l-Pjfi - P^V^-i  =  ra  +  (a1B+  •••  +aqBq)r2t 

We  note  that  in  some  of  the  literature,  the  notation  GARCHlp,^)  is  written  as 
GARCHC/,p);  that  is,  the  orders  are  switched.  It  can  be  rather  confusing  but  true  that  the 
two  different  sets  of  conventions  are  used  in  different  software  !  A  reader  must  find  out 
which  convention  is  used  by  the  software  on  hand  before  fitting  or  interpreting  a 
GARCH  model. 


(12.3.2) 


(12.3.3) 
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Because  conditional  variances  must  be  nonnegative,  the  coefficients  in  a  GARCH 
model  are  often  constrained  to  be  nonnegative.  However,  the  nonnegative  parameter 
constraints  are  not  necessary  for  a  GARCH  model  to  have  nonnegative  conditional  vari¬ 
ances  with  probability  1;  see  Nelson  and  Cao  (1992)  and  Tsai  and  Chan  (2006).  Allow¬ 
ing  the  parameter  values  to  be  negative  may  increase  the  dynamical  patterns  that  can  be 
captured  by  the  GARCH  model.  We  shall  return  to  this  issue  later.  Henceforth,  within 
this  section,  we  shall  assume  the  nonnegative  constraint  for  the  GARCH  parameters. 

Exhibit  12.12  shows  the  time  series  plot  of  a  time  series,  of  size  500,  simulated 
from  a  GARCH(1,1)  model  with  standard  normal  innovations  and  parameter  values 
co  =  0.02,  a  =  0.05,  and  p  =  0.9.  Volatility  clustering  is  evident  in  the  plot,  as  large 
(small)  fluctuations  are  usually  succeeded  by  large  (small)  fluctuations.  Moreover,  the 
inclusion  of  the  lag  1  of  the  conditional  variance  in  the  model  successfully  enhances  the 
memory  in  the  volatility. 


t 

>  set . seed (1234567) 

>  gar chi 1 . sim=garch . sim (alpha=c (0.02,0.05) ,beta=.9,n=500) 

>  plot (garchll . sim, type= ' 1 ylab=expression (r [t] ) ,  xlab= ' t ' ) 


Except  for  lags  3  and  20,  which  are  mildly  significant,  the  sample  ACF  and  PACF 
of  the  simulated  data,  shown  in  Exhibits  12.13  and  12.14,  do  not  show  significant  corre¬ 
lations.  Hence,  the  simulated  process  seems  to  be  basically  serially  uncorrelated  as  it  is. 
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Exhibit  12.13  Sample  ACF  of  Simulated  GARCH(1,1)  Process 


Lag 

>  acf (garchll . sim) 


Exhibit  12.14  Sample  PACF  of  Simulated  GARCH(1,1)  Process 


>  pacf (garchll . sim) 


Exhibits  12.15  through  12.18  show  the  sample  ACF  and  PACF  of  the  absolute  val¬ 
ues  and  the  squares  of  the  simulated  data. 
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Exhibit  12.15  Sample  ACF  of  the  Absolute  Values  of  the  Simulated 
GARCH(1,1)  Process 


Lag 

>  acf (abs (garchll . sim) ) 


Exhibit  12.16  Sample  PACF  of  the  Absolute  Values  of  the  Simulated 
GARCH(1,1)  Process 


>  pacf (abs (garchll . sim) ) 


These  plots  indicate  the  existence  of  significant  autocorrelation  patterns  in  the 
absolute  and  squared  data  and  indicate  that  the  simulated  process  is  in  fact  serially 
dependent.  Interestingly,  the  lag  1  autocorrelations  are  not  significant  in  any  of  these  last 
four  plots. 
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Exhibit  12.17  Sample  ACF  of  the  Squared  Values  of  the  Simulated 
GARCH(1,1)  Process 


Lag 


>  acf (garchll . simA2 ) 


Exhibit  12.18  Sample  PACF  of  the  Squared  Values  of  the  Simulated 
GARCH(1,1)  Process 


>  pacf (garchll . simA2 ) 


For  model  identification  of  the  GARCH  orders,  it  is  again  advantageous  to  express 
the  model  for  the  conditional  variances  in  terms  of  the  squared  returns.  Recall  the  defi- 

9  9 

nition  qr  =  rj  -  l  .  Similar  to  the  ARCH(l)  model,  we  can  show  that  {rp}  is  a 
serially  uncorrelated  sequence.  Moreover,  rp  is  uncorrelated  with  past  squared  returns. 
Substituting  the  expression  j  =  rj  -  q ;  into  Equation  (12.3.2)  yields 
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r't  -  ©  +  (Pi  +a])r]_l  +  +  (Pmax(p,q)  + amaX(p,qy>^-maX(p,q) 

+  11(-piTlf_i - Vp^t-p  (U3A) 

where  (:S k  =  0  for  all  integers  k> p  and  a/(  =  0  for  k  >  q.  This  shows  that  the  GARCH(p,g) 
model  for  the  return  series  implies  that  the  model  for  the  squared  returns  is  an 
ARMA(max(p,  q),p)  model.  Thus,  we  can  apply  the  model  identification  techniques  for 
ARMA  models  to  the  squared  return  series  to  identify  p  and  max(p,q).  Notice  that  if  q  is 
smaller  than  p,  it  will  be  masked  in  the  model  identification.  In  such  cases,  we  can  first 
fit  a  GARCH(p,/?)  model  and  then  estimate  q  by  examining  the  significance  of  the 
resulting  ARCH  coefficient  estimates. 

As  an  illustration,  Exhibit  12.19  shows  the  sample  EACF  of  the  squared  values 
from  the  simulated  GARCH(1,1)  series. 


Exhibit  12.19  Sample  EACF  for  the  Squared  Simulated  GARCH(1,1)  Series 


AR/MA 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

0 

0 

0 

X 

X 

0 

0 

X 

0 

0 

0 

0 

0 

0 

0 

1 

X 

0 

0 

0 

X 

0 

X 

X 

0 

0 

0 

0 

0 

0 

2 

X 

0 

0 

0 

0 

0 

X 

0 

0 

0 

0 

0 

0 

0 

3 

X 

X 

X 

0 

0 

X 

0 

0 

0 

0 

0 

0 

0 

0 

4 

X 

X 

0 

X 

X 

0 

0 

0 

0 

0 

0 

0 

0 

0 

5 

X 

0 

X 

X 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

6 

X 

0 

X 

X 

0 

X 

0 

0 

0 

0 

0 

0 

0 

0 

7 

X 

X 

X 

X 

X 

X 

0 

0 

0 

0 

0 
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0 

0 

>  eacf ( (garchll . sim) ^2 ) 


The  pattern  in  the  EACF  table  is  not  very  clear,  although  an  ARMA(2,2)  model 
seems  to  be  suggested.  The  fuzziness  of  the  signal  in  the  EACF  table  is  likely  caused  by 
the  larger  sampling  variability  when  we  deal  with  higher  moments.  Shin  and  Kang 
(2001)  argued  that,  to  a  first-order  approximation,  a  power  transformation  preserves  the 
theoretical  autocorrelation  function  and  hence  the  order  of  a  stationary  ARMA  process. 
Their  result  suggests  that  the  GARCH  order  may  also  be  identified  by  studying  the 
absolute  returns.  Indeed,  the  sample  EACF  table  for  the  absolute  returns,  shown  in 
Exhibit  12.20,  more  convincingly  suggests  an  ARMA(1,1)  model,  and  therefore  a 
GARCH(1,1)  model  for  the  original  data,  although  there  is  also  a  hint  of  a  GARCH(2,2) 
model. 


12.3  GARCH  Models 


295 


Exhibit  12.20  Sample  EACF  for  Absolute  Simulated  GARCH(1,1)  Series 
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>  eacf (abs (garchll . sim) ) 


For  the  absolute  CREF  daily  return  data,  the  sample  EACF  table  is  reported  in 
Exhibit  12.21,  which  suggests  a  GARCH(1,1)  model.  The  corresponding  EACF  table 
for  the  squared  CREF  returns  (not  shown)  is,  however,  less  clear  and  may  suggest  a 
GARCH(2,2)  model. 


Exhibit  12.21  Sample  EACF  for  the  Absolute  Daily  CREF  Returns 
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9  10  11  12  13 

X  X  o  o  o 
0  0  0  0  0 
0  0  0  0  0 
0  0  0  0  0 
0  0  0  0  0 
0  0  0  0  0 
0  0  0  0  0 
0  0  0  0  0 


>  eacf (abs (r . cref) ) 


Furthermore,  the  parameter  estimates  of  the  fitted  ARMA  model  for  the  absolute 
data  may  yield  initial  estimates  for  maximum  likelihood  estimation  of  the  GARCFI 
model.  For  example.  Exhibit  12.22  reports  the  estimated  parameters  of  the  fitted 
ARMA(1,1)  model  for  the  absolute  simulated  GARCH(l.l)  process. 
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Exhibit  12.22  Parameter  Estimates  with  ARMA(1,1)  Model  for  the  Absolute 
Simulated  GARCH(1,1)  Series 


Coefficient 

arl 

mal 

Intercept 

Estimate 

0.9821 

-0.9445 

0.5077 

s.e. 

0.0134 

0.0220 

0.0499 

>  arima (abs (garchll . sim)  , order=c (1,0,1)  ) 


Using  Equation  ( 12.3.4),  it  can  be  seen  that  [1  is  estimated  by  0.9445,  a  is  estimated 
by  0.9821  -  0.9445  =  0.03763,  and  o)  can  be  estimated  as  the  variance  of  the  original 
data  times  the  estimate  of  1  -  a  -  (3,  which  equals  0.0073.  Amazingly,  these  estimates 
turn  out  to  be  quite  close  to  the  maximum  likelihood  estimates  reported  in  the  next  sec¬ 
tion! 

We  now  derive  the  condition  for  a  GARCH  model  to  be  weakly  stationary.  Assume 
for  the  moment  that  the  return  process  is  weakly  stationary.  Taking  expectations  on  both 

<y 

sides  of  Equation  (12.3.4)  gives  an  equation  for  the  unconditional  variance  ct~ 

,  ,  max(p,  q) 

ct“=co+ct“  ^  (P;  +  a;)  (12.3.5) 

1=1 


so  that 


which  is  finite  if 


2 


CT 


_ CO _ 

max(p,  q) 

1  -  Z  (P,  +  a,) 

i  =  1 


max(p,  q ) 

Z  (P,  +  a,)<1 

i  =  1 


(12.3.6) 


(12.3.7) 


This  condition  can  be  shown  to  be  necessary  and  sufficient  for  the  weak  stationarity  of  a 
GARCH(/?,g)  model.  (Recall  that  we  have  implicitly  assumed  that  a.j  >  0,...,  ap  >  0, 
and  P|  >  0,. . .,  [i(/>  0.)  Henceforth,  we  assume  p  =  q  for  ease  of  notation. 

As  in  the  case  of  an  ARCH(l)  model,  finiteness  of  higher  moments  of  the  GARCH 
model  requires  further  stringent  conditions  on  the  coefficients;  see  Ling  and  McAleer 
(2002).  Also,  the  stationary  distribution  of  a  GARCH  model  is  generally  fat-tailed  even 
if  the  innovations  are  normal. 

-y 

In  terms  of  forecasting  the  /i-step-ahead  conditional  variance  o,“+  /,  | , ,  we  can  repeat 
the  arguments  used  in  the  preceding  section  to  derive  the  recursive  formula  that  for  h>  p 


2 

at  +  h\t  -  ro  + 


z  K  +  P/K+z,- 

i  =  1 


(12.3.8) 


More  generally,  for  arbitrary  h  >  1,  the  formula  is  more  complex,  as 
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where 


and 


2  P  2  P  a2 

CTf  +  /j|f_ff)+  J'.  aiat  +  h-i\t+  P  i'af  +  h  -  i\t  +  h  -  i  -  1 

(  =  1  /  =  1 

cf +  /;|r  =  '7  +  /,f°r/*<0 


(12.3.9) 


(12.3.10) 


a2 

+  h  —  i\t  +  h  -  i  **  I 


2 

®t  +  h-  i\t 
2 

+  h  -  i\t  +  h  —  i  -  K 


for  /z  -  i  -  1  >  0 
otherwise 


(12.3.11) 


The  computation  of  the  conditional  variances  may  be  best  illustrated  using  the 
GARCH(1,1)  model.  Suppose  that  there  are  n  observations  /-j,  r2,. . .,  rn  and 

°V jf_i  =  ro  +  alG-l  +  Pic?-l|r-2  (12.3.12) 

2 

To  compute  the  conditional  variances  for  2  <  t  <  n,  we  need  to  set  the  initial  value  ctJ|o  ■ 
This  may  be  set  to  the  stationary  unconditional  variance  cr2  =  co/(  1  -  a2  -  (S  | )  under  the 
stationarity  assumption  or  simply  as  rj  .  Thereafter,  we  can  compute  _  ]  by  the  for¬ 
mula  defining  the  GARCH  model.  It  is  interesting  to  observe  that 

°2t\t-i  =  ( 1  -  «i  -  Pi  )cy2  +  ai'T  i  +  Pjcyf  i|?  2  (12.3.13) 

so  that  the  estimate  of  the  one-step-ahead  conditional  volatility  is  a  weighted  average  of 
the  long-run  variance,  the  current  squared  return,  and  the  current  estimate  of  the  condi¬ 
tional  volatility.  Further,  the  MA(oo)  representation  of  the  conditional  variance  implies 
that 


CTdr-i  =  a2  +  ai(G-i +PiG-2  +  PiG-3  +  Pir2/-4+ •••)  (12.3.14) 

an  infinite  moving  average  of  past  squared  returns.  The  formula  shows  that  the  squared 
returns  in  the  distant  past  receive  exponentially  diminishing  weights.  In  contrast,  simple 
moving  averages  of  the  squared  returns  are  sometimes  used  to  estimate  the  conditional 
variance.  These,  however,  suffer  much  larger  bias. 

If  otj  +  Pj  =  1,  then  the  GARCH(1,1)  model  is  nonstationary  and  instead  is  called  an 
IGARCH(1,1)  model  with  the  letter  I  standing  for  integrated.  In  such  a  case,  we  shall 
drop  the  subscript  from  the  notation  and  let  a  =  1  -  p.  Suppose  that  co  =  0.  Then 

<4_1  =  (1-P)(G_1  +  PG_2  +  P2G_3  +  P3r2_4 +•••),  (12.3.15) 

an  exponentially  weighted  average  of  the  past  squared  returns.  The  famed  Riskmetrics 
software  in  finance  employs  the  IGARCH(1,1)  model  with  p  =  0.94  for  estimating  con¬ 
ditional  variances;  see  Andersen  et  al.  (2006). 
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12.4  Maximum  Likelihood  Estimation 


The  likelihood  function  of  a  GARCH  model  can  be  readily  derived  for  the  case  of  nor¬ 
mal  innovations.  We  illustrate  the  computation  for  the  case  of  a  stationary  GARCH(1,1) 
model.  Extension  to  the  general  case  is  straightforward.  Given  the  parameters  co,  a,  and 
P,  the  conditional  variances  can  be  computed  recursively  by  the  formula 

®~t\t -  l  =  ®  +  a^_i  +  Pof_i|,_2  (12.4.1) 

2 

for  t>  2,  with  the  initial  value,  ct]|o  ,  set  under  the  stationarity  assumption  as  the  sta¬ 
tionary  unconditional  variance  a2  =  oo/(  1  -  a  -  P).  We  use  the  conditional  pdf 

f(rt\rt-v  •••>rt)  =  -p==exp[-rf/(2o^_j)]  (12.4.2) 

and  the  joint  pdf 

f(rn,...,r-l)  =  f{r„_  j,  r{)f(rn\rn  j,  ...,  rj)  (12.4.3) 

Iterating  this  last  formula  and  taking  logs  gives  the  following  formula  for  the  log-likeli¬ 
hood  function: 

L(co,a,P)  =  -5log(27t)-i2;|l0g(of_1|f_2)  +  rf/of|f_1|  (12.4.4) 


There  is  no  closed-form  solution  for  the  maximum  likelihood  estimators  of  to,  a,  and  p, 
but  they  can  be  computed  by  maximizing  the  log-likelihood  function  numerically.  The 
maximum  likelihood  estimators  can  be  shown  to  be  approximately  normally  distributed 
with  the  true  parameter  values  as  their  means.  Their  covariances  may  be  collected  into  a 
matrix  denoted  by  A,  which  can  be  obtained  as  follows.  Let 


(0 


6  = 


a 


P 


(12.4.5) 


be  the  vector  of  parameters.  Write  the  ith  component  of  9  as  9,-  so  that  9 1  =  o),  07  =  a, 
and  9j  =  p.  The  diagonal  elements  of  A  are  the  approximate  variances  of  the  estimators, 
whereas  the  off-diagonal  elements  are  their  approximate  covariances.  So,  the  first  diag¬ 
onal  element  of  A  is  the  approximate  variance  of  co,  the  (l,2)th  element  of  A  is  the 
approximate  covariance  between  A  and  a,  and  so  forth.  We  now  outline  the  computa¬ 
tion  of  A.  Readers  not  interested  in  the  mathematical  details  may  skip  the  rest  of  this 
paragraph.  The  3x3  matrix  A  is  approximately  equal  to  the  inverse  matrix  of  the  3x3 
matrix  whose  (i,  /jth  element  equals 
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d°t\t-id°t\t-  1 


4 

af|f-  1 


56; 


dQj 


(12.4.6) 


The  partial  derivatives  in  this  expression  can  be  obtained  recursively  by  differentiating 
Equation  (12.4.1).  For  example,  differentiating  both  sides  of  Equation  (12.4.1)  with 
respect  to  co  yields  the  recursive  formula 


1  =  ,  dot-l\t-2 

Sto  P  5(o 


(12.4.7) 


Other  partial  derivatives  can  be  computed  similarly. 

Recall  that,  in  the  previous  section,  the  simulated  GARCH(1,1)  series  was  identi¬ 
fied  to  be  either  a  GARCH(  1,1)  model  or  a  GARCH(2,2)  model.  The  model  fit  of  the 
GARCH(2,2)  model  is  reported  in  Exhibit  12.23,  where  the  estimate  of  ©  is  denoted  by 
aO,  that  of  otj  by  al,  that  of  (1 1  by  bl,  and  so  forth.  Note  that  none  of  the  coefficients  is 
significant,  although  a2  is  close  to  being  significant.  The  model  fit  for  the  GARCH(1,1) 
model  is  given  in  Exhibit  12.24. 


Exhibit  12.23  Estimates  for  GARCH(2,2)  Model  of  a  Simulated 
GARCH(1,1)  Series 


Coefficient 

Estimate 

Std.  Error 

f-value 

P/t>|f!) 

aO 

1.835e-02 

1.515e-02 

1.211 

0.2257 

al 

4.09e-15 

4.723e-02 

8.7e-14 

1.0000 

a2 

1.136e-01 

5.855e-02 

1.940 

0.0524 

bl 

3.369e-01 

3.696e-01 

0.911 

0.3621 

b2 

5.100e-01 

3.575e-01 

1.426 

0.1538 

>  gl=garch (garchll . sim, order=c (2,2)  ) 

>  summary (gl) 


Exhibit  12.24  Estimates  for  GARCH(1,1)  Model  of  a  Simulated 
GARCH(1,1)  Series 


Coefficient 

Estimate 

Std.  Error 

f-value 

P>\>\l\) 

aO 

0.007575 

0.007590 

0.998 

0.3183 

al 

0.047184 

0.022308 

2.115 

0.0344 

bl 

0.935377 

0.035839 

26.100 

<  0.0001 

>  g2=garch (garchll .sim, order=c (1,1)  ) 

>  summary (g2) 
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Now  all  coefficient  estimates  (except  aO)  are  significant.  The  AIC  of  the  fitted 
GARCH(2,2)  model  is  961.0,  while  that  of  the  fitted  GARCH(U)  model  is  958.0, and 
thus  the  GARCH(1,1)  model  provides  a  better  fit  to  the  data.  (Here,  AIC  is  defined  as 
minus  two  times  the  log-likelihood  of  the  fitted  model  plus  twice  the  number  of  param¬ 
eters.  As  in  the  case  of  ARIMA  models,  a  smaller  AIC  is  preferable.)  A  95%  confidence 
interval  for  a  parameter  is  given  (approximately)  by  the  estimate  ±1.96  times  its  stan¬ 
dard  error.  So,  an  approximate  95%  confidence  interval  for  to  equals  (-0.0073,  0.022), 
that  of  a-i  equals  (0.00345,  0.0909),  and  that  of  Pj  equals  (0.865,1.01).  These  all  contain 
their  true  values  of  0.02,  0.05,  and  0.9,  respectively.  Note  that  the  standard  error  of  bl  is 
0.0358.  Since  the  standard  error  is  approximately  proportional  to  1  /Jn,  the  standard 
error  of  bl  is  expected  to  be  about  0.0566  (0.0462)  if  the  sample  size  n  is  200  (300). 
Indeed,  fitting  the  GARCH(1,1)  model  to  the  first  200  simulated  data,  bl  was  found  to 
equal  0.0603  with  standard  error  equal  to  50.39!  When  the  sample  size  was  increased  to 
300,  bl  became  0.935  with  standard  error  equal  to  0.0449.  This  example  illustrates  that 
fitting  a  GARCH  model  generally  requires  a  large  sample  size  for  the  theoretical  sam¬ 
pling  distribution  to  be  valid  and  useful;  see  Shephard  (1996,  p.  10)  for  a  relevant  dis¬ 
cussion. 

For  the  CREF  return  data,  we  earlier  identified  either  a  GARCH(1,1)  or 
GARCH(2, 2)  model.  The  AIC  of  the  fitted  GARCH(1,1)  model  is  969.6,  whereas  that 
of  the  GARCH(2,2)  model  is  970.3.  Hence  the  GARCH(1,1)  model  provides  a  margin¬ 
ally  better  fit  to  the  data.  Maximum  likelihood  estimates  of  the  fitted  GARCH(  1,1) 
model  are  reported  in  Exhibit  12.25. 


Exhibit  12.25  Maximum  Likelihood  Estimates  of  the  GARCH(1,1)  Model  for 
the  CREF  Stock  Returns 


Parameter 

Estimate* 

Std.  Error 

f-value 

Pri>\A) 

aO 

0.01633 

0.01237 

1.320 

0.1869 

al 

0.04414 

0.02097 

2.105 

0.0353 

bl 

0.91704 

0.04570 

20.066 

<0.0001 

'  As  remarked  earlier,  the  analysis  depends  on  the  scale  of  measurement.  In  par¬ 
ticular,  a  GARCH(1,1)  model  based  on  the  raw  CREF  stock  returns  yields 
estimates  a0  =  0.00000511,  al  =  0.0941,  and  bl  =  0.789. 


>  ml=garch (x=r . cref , order=c (1,1)  ) 

>  summary (ml) 


Note  that  the  long-term  variance  of  the  GARCH(l.l)  model  is  estimated  to  be 

ro/(l  -  ot-  P)  =  0.01633/(1  -0.04414-0.91704)  =  0.4206  (12.4.8) 

which  is  very  close  to  the  sample  variance  of  0.4161. 

In  practice,  the  innovations  need  not  be  normally  distributed.  In  fact,  many  financial 
time  series  appear  to  have  nonnormal  innovations.  Nonetheless,  we  can  proceed  to  esti- 
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mate  the  GARCH  model  by  pretending  that  the  innovations  are  normal.  The  resulting 
likelihood  function  is  called  the  Gaussian  likelihood,  and  estimators  maximizing  the 
Gaussian  likelihood  are  called  the  quasi-maximum  likelihood  estimators  (QMLEs).  It 
can  be  shown  that,  under  some  mild  regularity  conditions,  including  stationarity,  the 
quasi-maximum  likelihood  estimators  are  approximately  normal,  centered  at  the  true 
parameter  values,  and  their  covariance  matrix  equals  [(k  +  2)/2]A  ,  where  k  is  the 
(excess)  kurtosis  of  the  innovations  and  A  is  the  covariance  matrix  assuming  the  innova¬ 
tions  are  normally  distributed — see  the  discussion  above  for  the  normal  case.  Note  that 
the  heavy-tailedness  of  the  innovations  will  inflate  the  covariance  matrix  and  hence 
result  in  less  reliable  parameter  estimates.  In  the  case  where  the  innovations  are  deemed 
nonnormal,  this  result  suggests  a  simple  way  to  adjust  the  standard  errors  of  the 
quasi-maximum  likelihood  estimates  by  multiplying  the  standard  errors  of  the  Gaussian 
likelihood  estimates  from  a  routine  that  assumes  normal  innovations  by  J(k  +  2)/2  , 
where  k  can  be  substituted  with  the  sample  kurtosis  of  the  standardized  residuals  that 
are  defined  below.  It  should  be  noted  that  one  disadvantage  of  QMLE  is  that  the  AIC  is 
not  strictly  applicable. 

Let  the  estimated  conditional  standard  deviation  be  denoted  by  af|f  _  j.  The  stan¬ 
dardized  residuals  are  then  defined  as 


7t\t-  1 


(12.4.9) 


The  standardized  residuals  from  the  fitted  model  are  proxies  for  the  innovations  and  can 
be  examined  to  cast  light  on  the  distributional  form  of  the  innovations.  Once  a  (parame¬ 
terized)  distribution  for  the  innovations  is  specified,  for  example  a  f-distribution,  the 
corresponding  likelihood  function  can  be  derived  and  optimized  to  obtain  maximum 
likelihood  estimators;  see  Tsay  (2005)  for  details.  The  price  of  not  correctly  specifying 
the  distributional  form  of  the  innovation  is  a  loss  in  efficiency  of  estimation,  although, 
with  large  datasets,  the  computational  convenience  of  the  Gaussian  likelihood  approach 
may  outweigh  the  loss  of  estimation  efficiency. 
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Before  we  accept  a  fitted  model  and  interpret  its  findings,  it  is  essential  to  check 
whether  the  model  is  correctly  specified,  that  is,  whether  the  model  assumptions  are 
supported  by  the  data.  If  some  key  model  assumptions  seem  to  be  violated,  then  a  new 
model  should  be  specified;  fitted,  and  checked  again  until  a  model  is  found  that  provides 
an  adequate  fit  to  the  data.  Recall  that  the  standardized  residuals  are  defined  as 

g,  =  r/Vl  (12'5-1} 

which  are  approximately  independently  and  identically  distributed  if  the  model  is  cor¬ 
rectly  specified.  As  in  the  case  of  model  diagnostics  for  ARIMA  models,  the  standard¬ 
ized  residuals  are  very  useful  for  checking  the  model  specification.  The  normality 
assumption  of  the  innovations  can  be  explored  by  plotting  the  QQ  normal  scores  plot. 
Deviations  from  a  straight  line  pattern  in  the  QQ  plot  furnish  evidence  against  normality 
and  may  provide  clues  on  the  distributional  form  of  the  innovations.  The  Shapiro- Wilk 
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test  and  the  Jarque-Bera  test  are  helpful  for  formally  testing  the  normality  of  the  innova¬ 
tions.^ 

For  the  GARCH(1,1)  model  fitted  to  the  simulated  GARCH(l.l)  process,  the  sam¬ 
ple  skewness  and  kurtosis  of  the  standardized  residuals  equal  -0.0882  and  -0.104, 
respectively.  Moreover,  both  the  Shapiro- Wilk  test  and  the  Jarque-Bera  test  suggest  that 
the  standardized  residuals  are  normal. 

For  the  GARCH(1,1)  model  fitted  to  the  CREF  return  data,  the  standardized  residu¬ 
als  are  plotted  in  Exhibit  12.26.  There  is  some  tendency  for  the  residuals  to  be  larger  in 
magnitude  towards  the  end  of  the  study  period,  perhaps  suggesting  that  there  is  some 
residual  pattern  in  the  volatility.  The  QQ  plot  of  the  standardized  residuals  is  shown  in 
Exhibit  12.27.  The  QQ  plot  shows  a  largely  straight-line  pattern.  The  skewness  and  the 
kurtosis  of  the  standardized  residuals  are  0.0341  and  0.205,  respectively.  The  p- value  of 
the  Jarque-Bera  test  equals  0.58  and  that  of  the  Shapiro-Wilk  test  is  0.34.  Flence,  the 
normality  assumption  cannot  be  rejected. 


12.5  Model  Diagnostics 


303 


Exhibit  12.27  QQ  Normal  Scores  Plot  of  Standardized  Residuals  from  the 
Fitted  GARCH(1,1)  Model  of  Daily  CREF  Returns 


Theoretical  Quantiles 

>  win . graph (width=2 . 5 , height =2 . 5 , point size =8 ) 

>  qqnorm (residuals (ml )) ;  qqline (residuals (ml) ) 


If  the  GARCH  model  is  correctly  specified,  then  the  standardized  residuals  { sf } 
should  be  close  to  independently  and  identically  distributed.  The  independently  and 
identically  distributed  assumption  of  the  innovations  can  be  checked  by  examining  their 
sample  acf.  Recall  that  the  portmanteau  statistic  equals 


™  a2 

1  Z  Pk 
k=  1  K 


where  pA:  is  the  lag  k  autocorrelation  of  the  standardized  residuals  and  n  is  the  sample 
size.  (Recall  that  the  same  statistic  is  also  known  as  the  Box-Pierce  statistic  and,  in  a 
modified  version,  the  Ljung-Box  statistic.)  Furthermore,  it  can  be  shown  that  the  test 
statistic  is  approximately  y2  distributed  with  m  degrees  of  freedom  under  the  null 
hypothesis  that  the  model  is  correctly  specified.  This  result  relies  on  the  fact  that  the 
sample  autocorrelations  of  nonzero  lags  from  an  independently  and  identically  distrib¬ 
uted  sequence  are  approximately  independent  and  normally  distributed  with  zero  mean 
and  variance  1/n,  and  this  result  holds  approximately  also  for  the  sample  autocorrela¬ 
tions  of  the  standardized  residuals  if  the  data  are  truly  generated  by  a  GARCH  model  of 
the  same  orders  as  those  of  the  fitted  model.  However,  the  portmanteau  test  does  not 
have  strong  power  against  uncorrelated  and  yet  serially  dependent  innovations.  In  fact, 
we  start  out  with  the  assumption  that  the  return  data  are  uncorrelated,  so  the  preceding 
test  is  of  little  interest. 

More  useful  tests  may  be  devised  by  studying  the  autocorrelation  structure  of  the 
absolute  standardized  residuals  or  the  squared  standardized  residuals.  Let  the  lag  k  auto¬ 
correlation  of  the  absolute  standardized  residuals  be  denoted  by  pk  l  and  that  of  the 
squared  standardized  residuals  by  p^.  9  .  Unfortunately,  the  approximate  y}  distribution 
with  m  degrees  of  freedom  for  the  corresponding  portmanteau  statistics  based  on  p,  1 
(pk  2 )  is  no  longer  valid,  the  reason  being  that  the  estimation  of  the  unknown  parame- 
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ters  induces  a  nonnegligible  effect  on  the  tests.  Li  and  Mak  (1994)  showed  that  the  x~ 
approximate  distribution  may  be  preserved  by  replacing  the  sum  of  squared  autocorrela¬ 
tions  by  a  quadratic  form  in  the  autocorrelations;  see  also  Li  (2003).  For  the  absolute 
standardized  residuals,  the  test  statistic  takes  the  form 

m  m 

»  £  IX;  PuP/,1  (12-5'2) 

i  =  l  7  =  1 

We  shall  call  this  modified  test  statistic  the  generalized  portmanteau  test  statistic.  How¬ 
ever,  the  q’s  depend  on  m,  the  number  of  lags,  and  they  are  specific  to  the  underlying 
true  model  and  so  must  be  estimated  from  the  data.  For  the  squared  residuals,  the  q’s 
take  different  values.  See  Appendix  I  on  page  318  for  the  formulas  for  the  q’s. 

We  illustrate  the  generalized  portmanteau  test  with  the  CREF  data.  Exhibit  12.28, 
plots  the  sample  ACF  of  the  squared  standardized  residuals  from  the  fitted  GARCH(1,1) 
model.  The  (individual)  critical  limits  in  the  figure  are  based  on  the  1/n  nominal  vari¬ 
ance  under  the  assumption  of  independently  and  identically  distributed  data.  As  dis¬ 
cussed  above,  this  nominal  value  could  be  very  different  from  the  actual  variance  of  the 
autocorrelations  of  the  squared  residuals  even  when  the  model  is  correctly  specified. 
Nonetheless,  the  general  impression  from  the  figure  is  that  the  squared  residuals  are 
serially  uncorrelated. 


Exhibit  12.28  Sample  ACF  of  Squared  Standardized  Residuals  from  the 
GARCH(1,1)  Model  of  the  Daily  CREF  Returns 


0  5  10  15  20  25 


Lag 

>  acf (residuals (ml ) ^2 , na . action=na . omit) 


Exhibit  12.29  displays  the  /;- values  of  the  generalized  portmanteau  tests  with  the 
squared  standardized  residuals  from  the  fitted  GARCH(  1,1)  model  of  the  CREF  data  for 
m  =  1  to  20.  All  p- values  are  higher  than  5%,  suggesting  that  the  squared  residuals  are 
uncorrelated  over  time,  and  hence  the  standardized  residuals  may  be  independent. 
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Exhibit  12.29  Generalized  Portmanteau  Test  p-Values  for  the  Squared 
Standardized  Residuals  for  the  GARCH(1 ,1 )  Model  of  the 
Daily  CREF  Returns 

O 


00 

o 

Cl)  CD 

-i  ^ 

> 

I 

o 

C\J 

o 

o 

o 


>  gBox (ml , method= ' squared ' ) 


We  repeated  checking  the  model  using  the  absolute  standardized  residuals — see 
Exhibits  12.30  and  12.31.  The  lag  2  autocorrelation  of  the  absolute  residuals  is  signifi¬ 
cant  according  to  the  nominal  critical  limits  shown.  Furthermore,  the  generalized  port¬ 
manteau  tests  are  significant  when  m  —  2  and  3  and  marginally  not  significant  at  m  -  4. 
The  sample  EACF  table  (not  shown)  of  the  absolute  standardized  residuals  suggests  an 
AR(2)  model  for  the  absolute  residuals  and  hence  points  to  the  possibility  that  the  CREF 
returns  may  be  identified  as  a  GARCH(1,2)  process.  However,  the  fitted  GARCH(1,2) 
model  to  the  CREF  data  did  not  improve  the  fit,  as  its  AIC  was  978.2 — much  higher 
than  969.6,  that  of  the  GARCH(1,1)  model.  Therefore,  we  conclude  that  the  fitted 
GARCH(1,1)  model  provides  a  good  fit  to  the  CREF  data. 
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Exhibit  12.30  Sample  ACF  of  the  Absolute  Standardized  Residuals  from 
the  GARCH(1,1)  Model  for  the  Daily  CREF  Returns 


0  5  10  15  20  25 


Lag 

>  acf (abs (residuals (ml ) ) , na . action=na . omit ) 


Exhibit  12.31  Generalized  Portmanteau  Test  p-Values  for  the  Absolute 
Standardized  Residuals  for  the  GARCH(1,1)  Model  of  the 
Daily  CREF  Returns 


5  10  15  20 


Lag 

>  gBox (ml , method= ' absolute ' ) 

Given  that  the  GARCH(  1,1)  model  provides  a  good  fit  to  the  CREF  data,  we  may 
use  it  to  forecast  the  future  conditional  variances.  Exhibit  12.32  shows  the  within-sam- 
ple  estimates  of  the  conditional  variances,  which  capture  several  periods  of  high  volatil¬ 
ity,  especially  the  one  at  the  end  of  the  study  period.  At  the  final  time  point,  the  squared 
return  equals  2.159,  and  the  conditional  variance  is  estimated  to  be  0.441 1.  These  values 
combined  with  Equations  (12.3.8)  and  (12.3.9)  can  be  used  to  compute  the  forecasts  of 
future  conditional  variances.  For  example,  the  one-step-ahead  forecast  of  the  condi¬ 
tional  variance  equals  0.01633  +  0.04414*2.159  +  0.91704*0.4411  =  0.5161.  The 
two-step  forecast  of  the  conditional  variance  equals  0.01633  +  0.04414*0.5161  + 
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0.91704*0.5161  =  0.5124,  and  so  forth,  with  the  longer  lead  forecasts  eventually 
approaching  0.42066,  the  long-run  variance  of  the  model.  The  conditional  variances 
may  be  useful  for  pricing  financial  assets  through  the  Black-Scholes  formula  and  calcu¬ 
lation  of  the  value  at  risk  (VaR);  see  Tsay  (2005)  and  Andersen  et  al.  (2006). 

It  is  interesting  to  note  that  the  need  for  incorporating  ARCH  in  the  data  is  also 
supported  by  the  McLeod-Li  test  applied  to  the  residuals  of  the  AR(1)  +  outlier  model; 
see  Exhibit  ( 12.9),  page  283. 


Exhibit  12.32  Estimated  Conditional  Variances  of  the  Daily  CREF  Returns 


t 

>  plot (( fitted (ml )[, 1] ) ^2 , type= ' 1 ' , ylab= ' Conditional  Variance1, 
xlab= ' t ' ) 


12.6  Conditions  for  the  Nonnegativity  of  the 
Conditional  Variances 


Because  the  conditional  variance  crjy  _  i  must  be  nonnegative,  the  GARCH  parameters 
are  often  constrained  to  be  nonnegative.  However,  the  nonnegativity  parameter  con¬ 
straints  need  not  be  necessary  for  the  nonnegativity  of  the  conditional  variances.  This 
issue  was  first  explored  by  Nelson  and  Cao  (1992)  and  more  recently  by  Tsai  and  Chan 
(2006).  To  better  understand  the  problem,  first  consider  the  case  of  an  ARCH(g)  model. 
Then  the  conditional  variance  is  given  by  the  formula 

at\t- 1  =  co  +  air?-i+a2r?-2  +  -  +  aqrt-q  (12.6.1) 

Assume  that  q  consecutive  returns  can  take  on  any  arbitrary  set  of  real  numbers.  If  one 
of  the  a’s  is  negative,  say  aj  <  0,  then  o^r_  ^  will  be  negative  if  rj_  l  is  sufficiently 
large  and  the  other  r  s  are  sufficiently  close  to  zero.  Hence,  it  is  clear  that  all  a’s  must  be 
nonnegative  for  the  conditional  variances  to  be  nonnegative.  Similarly,  by  letting  the 
returns  be  close  to  zero,  it  can  be  seen  that  to  must  be  nonnegative — otherwise  the  con¬ 
ditional  variance  may  become  negative.  Thus,  it  is  clear  that  for  an  ARCH  model,  the 
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non-negativity  of  all  ARCH  coefficients  is  necessary  and  sufficient  for  the  conditional 

9 

variances  to  be  always  nonnegative. 

The  corresponding  problem  for  a  GARCH(p,  q)  model  can  be  studied  by  expressing 
the  GARCH  model  as  an  infinite-order  ARCH  model.  The  conditional  variance  process 
( g(2;  1 1  is  an  ARMA(p,q)  model  with  the  squared  returns  playing  the  role  of  the  noise 
process.  Recall  that  an  ARMA(/;,(/)  model  can  be  expressed  as  an  MA(oo)  model  if  all 
the  roots  of  the  AR  characteristic  polynomial  lie  outside  the  unit  circle.  Hence,  assum¬ 
ing  that  all  the  roots  of  I  -  [1  yx  -  (H  r2 - (3 pXp  =  0  have  magnitude  greater  than  1,  the 

conditional  variances  satisfy  the  equation 


where 


CTrjr-  1  =  ®*  +  VlG-1  +V2G-2+ 


* 

CO 


ff>/ 


- 1  p, 

i  =  1 


(12.6.2) 


(12.6.3) 


It  can  be  similarly  shown  that  the  conditional  variances  are  all  nonnegative  if  and 
only  if  to*  and  \\ij  >  0  for  all  integers  j  >  1 .  The  coefficients  in  the  ARCH(oo)  representa¬ 
tion  relate  to  the  parameters  of  the  GARCH  model  through  the  equality 

a ,B  +  •••  +  a  Bq 

- - - c± -  =  v|/1fi  +  v|/2B‘  +  ---  (12.6.4) 

1-P  iB - P  pBP 

If  p  =  1,  then  it  can  be  easily  checked  that  \\if.  =  [S  |  v|//f  ]  for  k  >  q.  Thus,  \\ij  >  0  for 

all  /  >  1  if  and  only  if  P|  >  0  and  i|/ 1  >  0 _ ,  \\>q  >  0.  For  higher  GARCH  order,  the  situa¬ 

tion  is  more  complex.  Let  kp  1  <j  <  p,  be  the  roots  of  the  characteristic  equation 

1  —  p ,  .v - Pp/  =  0  (12.6.5) 

With  no  loss  of  generality,  we  can  and  shall  henceforth  assume  the  convention  that 

|L1|<|L2|<---<|Lp|  (12.6.6) 

Let  i  =  ,J-\  and  k  denote  the  complex  conjugate  of  X,  B(x)  =  1  -  p | v - P pXp, 

and  B{  1  >  be  the  first  derivative  of  B.  We  then  have  the  following  result. 

Result  1:  Consider  a  GARCH(/?,g)  model  where  p  >  2.  Assume  Al,  that  all  the  roots  of 
the  equation 

1  -  Pj.v-  P2r2 - Pp/  =  0  (12.6.7) 

have  magnitude  greater  than  1,  and  A2,  that  none  of  these  roots  satisfy  the  equation 

ttjX  +  ■■  ■  +  aqxq  =  0  (12.6.8) 

Then  the  following  hold: 

(a)  co  >  0  if  and  only  if  co  >  0. 
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(b)  Assuming  the  roots  X,j,. . Xp  are  distinct,  and  |^j|  <  |Xo|,  then  the  conditions 
given  in  Equation  (12.6.9)  are  necessary  and  sufficient  for  1 \rk  >  0  for  all  positive 
integers  k: 


X  j  is  real  and  X  l  >  1 

a(^j) >  0 


(12.6.9) 


\\ik  >  0  for  k  =  1,  ...,  k* 


where  k'  is  the  smallest  integer  greater  than  or  equal  to 


log(^l)-log[(P~  O'-*] 

logCl^jl)  -  log(|A.2|) 

r-  = - — ,  for  1  <  j  < p,  and  r*  =  max  (|r;|)  (12.6.10) 

7  Bil\Xj)  2 <  / </’ '  '' 

For  p  =  2,  the  k  defined  in  Result  1  can  be  shown  to  be  q  +  1;  see  Theorem  2  of  Nelson 
and  Cao  (1992).  If  the  k‘  defined  in  Equations  (12.6.10)  is  a  negative  number,  then  it 
can  be  seen  from  the  proof  given  in  Tsai  and  Chan  (2006)  that  \\ik  >  0  for  all  positive  k. 

Tsai  and  Chan  (2006)  have  also  derived  some  more  readily  verifiable  conditions  for 
the  conditional  variances  to  be  always  nonnegative. 

Result  2:  Let  the  assumptions  of  Result  1  be  satisfied.  Then  the  following  hold: 

(a)  For  a  GARCH(/j.  I )  model,  if  Xj  is  real  and  Xj>  1,  for  j  =  1  and  ot]  >  0, 
then  >  0  for  all  positive  integers  k. 

(b)  For  a  GARCHt/;,  I )  model,  if  \\ik  >  0  for  all  positive  integers  k,  then  a ]  >  0, 

P  _i 

Xj  >  0,  is  real,  and  >  1. 

j  =  l 


(c)  For  a  GARCH(3, 1 )  model,  \\ik  >  0  for  all  positive  integers  k  if  and  only  if  dj 
>  0  and  either  of  the  following  cases  hold: 

Case  1.  All  the  Xj’s  are  real  numbers,  >  1,  and  A,j 1  +  X2  +  X^  ^  0  . 

Case  2.  >  1,  and  X0  =  X^  =  \Xn\e  =  a  +  bi ,  where  a  and  b  are  real  num¬ 

bers,  b  >  0,  and  0  <  9  <  n: 


Case  2.1.  9  =  2n/r  for  some  integer  r  >  3,  and  1  <  A,j  <  l^l- 

Case  2.2.  0  £  {2 n/r  \  r  =  3,  4,...},  and  |a2|/^ i  >  .vq  >  1>  where  xq  is  the  largest  real 
root  of  fn  q(x)  =  0,  and 


fn,  e(x) 


h  +  2  sin[(n  +  2)0]  sin[(n  +  l)0] 

■A  A  T 

sin0  sin  9 


(12.6.11) 


where  n  is  the  smallest  positive  integer  such  that  sin((n+l  )0)  <  0  and  sin((n+2)9) 
>0. 
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(d)  For  a  GARCH(3,1)  model,  if  X0  =  X^  =  =  a  +  bi ,  where  a  and  b 

are  real  numbers,  b  >  0,  and  a  >  X  |  >  1 ,  then  \|/£  >  0  for  all  positive  integers  k. 

(e)  For  a  GARCH(4,1)  model,  if  the  Xj  s  are  real  for  I  <j  <  4.  then  a  necessary 

and  sufficient  condition  for  {  }/°=  o  1°  be  nonnegative  is  that  a  |  >  0, 

7.  ^  ^  +  X^  +  7.-^  +  X^  >  0  ,  and  7.  j  >  1 . 

Note  that  x0  is  the  only  real  root  of  Equation  (12.6.1 1)  that  is  greater  than  or  equal  to  1. 
Also,  Tsai  and  Chan  (2006)  proved  that  if  the  ARCH  coefficients  (a’s)  of  a 
GARCH  (/;>,£/)  model  are  all  nonnegative,  the  model  has  nonnegative  conditional  vari¬ 
ances  if  the  nonnegativity  property  holds  for  the  associated  GARCH)/},  I  j  models  with  a 
nonnegative  a  |  coefficient. 

12.7  Some  Extensions  of  the  GARCH  Model 


The  GARCH  model  may  be  generalized  in  several  directions.  First,  the  GARCH  model 
assumes  that  the  conditional  mean  of  the  time  series  is  zero.  Even  for  financial  time 
series,  this  strong  assumption  need  not  always  hold.  In  the  more  general  case,  the  condi¬ 
tional  mean  structure  may  be  modeled  by  some  ARMA(m,  v)  model,  with  the  white  noise 
term  of  the  ARMA  model  modeled  by  some  GARCH(p.  q)  model.  Specifically,  let  j  Yt\ 
be  a  time  series  given  by  (now  we  switch  to  using  the  notation  Yt  to  denote  a  general 
time  series) 


hYt- 


+  ...  + 


KYt 


-«eo  +  G  +  0i<! 


+  0i  <et- 


2 

at\t-  1 


°t\t- lef 

CO  +  axe]_ !  +  •••  +  aqe2t_q  +  +  ■■■  +  Vpa2t_p\t_p_x 


(12.7.1) 


and  where  we  have  used  the  plus  convention  in  the  MA  parts  of  the  model.  The  ARMA 
orders  can  be  identified  based  on  the  time  series  |  Yt},  whereas  the  GARCH  orders  may 
be  identified  based  on  the  squared  residuals  from  the  fitted  ARMA  model.  Once  the 
orders  are  identified,  full  maximum  likelihood  estimation  for  the  ARMA  +  GARCH 
model  can  be  carried  out  by  maximizing  the  likelihood  function  as  defined  in  Equation 
(12.4.4)  on  page  298  but  with  rt  there  replaced  by  et  that  are  recursively  computed 
according  to  Equation  (12.7.1).  The  maximum  likelihood  estimators  of  the  ARMA 
parameters  are  approximately  independent  of  their  GARCH  counterparts  if  the  innova¬ 
tions  sf  have  a  symmetric  distribution  (for  example,  a  normal  or  /-distribution)  and  their 
standard  errors  are  approximately  given  by  those  in  the  pure  ARMA  case.  Likewise,  the 
GARCH  parameter  estimators  enjoy  distributional  results  similar  to  those  for  the  pure 
GARCH  case.  However,  the  ARMA  estimators  and  the  GARCH  estimators  are  corre¬ 
lated  if  the  innovations  have  a  skewed  distribution.  In  the  next  section,  we  illustrate  the 
ARMA  +  GARCH  model  with  the  daily  exchange  rates  of  the  U.S.  dollar  to  the  Hong 
Kong  dollar. 

Another  direction  of  generalization  concerns  nonlinearity  in  the  volatility  process. 
For  financial  data,  this  is  motivated  by  a  possible  asymmetric  market  response  that  may, 
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for  example,  react  more  strongly  to  a  negative  return  than  a  positive  return  of  the  same 
magnitude.  The  idea  can  be  simply  illustrated  in  the  setting  of  an  ARCH(l)  model, 
where  the  asymmetry  can  be  modeled  by  specifying  that 

2  2 

=  co  +  aef_  j  +  ymin(et_  v  0)2  (12.7.2) 

Such  a  model  is  known  as  a  GJR  model — a  variant  of  which  allows  the  threshold  to  be 
unknown  and  other  than  0.  See  Tsay  (2005)  for  other  useful  extensions  of  the  GARCH 
models. 

12.8  Another  Example:  The  Daily  USD/HKD  Exchange  Rates 


As  an  illustration  for  the  ARIMA  +  GARCH  model,  we  consider  the  daily  USD/HKD 
(U.S.  dollar  to  Hong  Kong  dollar)  exchange  rate  from  January  1,  2005  to  March  7, 
2006,  altogether  43 1  days  of  data.  The  returns  of  the  daily  exchange  rates  are  shown  in 
Exhibit  12.33  and  appear  to  be  stationary,  although  volatility  clustering  is  evident  in  the 
plot. 


Exhibit  12.33  Daily  Returns  of  USD/HKD  Exchange  Rate:  1/1/05-3/7/06 


Day 

>  data (usd. hkd) 

>  plot (ts (usd . hkd$hkrate , f req=l ) , type= 1 1 ' , xlab= ' Day ' , 

ylab= ' Return ' ) 


It  is  interesting  to  note  that  the  need  for  incorporating  ARCH  in  the  data  is  also 
supported  by  the  McLeod-Li  test  applied  to  the  residuals  of  the  AR(1)  +  outlier  model; 
see  below  for  further  discussion  of  the  additive  outlier.  Exhibit  12.34  shows  that  the  tests 
are  all  significant  when  the  number  of  lags  of  the  autocorrelations  of  the  squared  residu¬ 
als  ranges  from  1  to  26,  displaying  strong  evidence  of  conditional  heteroscedascity. 
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Exhibit  12.34  McLeod-Li  Test  Statistics  for  the  USD/HKD  Exchange  Rate 


CD 

03 

> 

I 

CL 


CO 

o 


CD 

o 


■'fr 

o 


C\J 

o 


o 

o 


oooooooooooooooooooooooooo 


0  5  10  15  20  25 


Lag 


>  attach (usd . hkd) 

>  McLeod. Li . test (arima (hkrate, order=c (1,0,0) , 

xreg=data . frame (outlierl ) ) ) 


An  AR(1)  +  GARCH(3,1)  model  was  fitted  to  the  (raw)  return  data  with  an  additive 
outlier  one  day  after  July  22,  2005,  the  date  when  China  revalued  the  yuan  by  2.1%  and 
adopted  a  floating-rate  system  for  it.  The  outlier  is  shaded  in  gray  in  Exhibit  12.33.  The 
intercept  term  in  the  conditional  mean  function  was  found  to  be  insignificantly  different 
from  zero  and  hence  is  omitted  from  the  model.  Thus  we  take  the  returns  to  have  zero 
mean  unconditionally.  The  fitted  model  has  an  A1C  =  -2070.9,  being  smallest  among 
various  competing  (weakly)  stationary  models — see  Exhibit  12.35.  Interestingly,  for 
lower  GARCH  orders  (p  <  2),  the  fitted  models  are  nonstationary,  but  the  fitted  models 
are  largely  stationary  when  the  GARCH  order  is  higher  than  2.  As  the  data  appear  to  be 
stationary,  we  choose  the  AR(1)  +  GARCH(3,1)  model  as  the  final  model. 

The  AR  +  GARCH  models  partially  reported  in  Exhibit  12.35  were  fitted  using  the 
Proc  Autoreg  routine  in  the  SAS  software. '  We  used  the  default  option  of  imposing  that 
the  Nelson-Cao  inequality  constraints  for  the  GARCH  conditional  variance  process  be 
nonnegative.  However,  the  inequality  constraints  so  imposed  are  only  necessary  and  suf¬ 
ficient  for  the  nonnegativity  of  the  conditional  variances  of  a  GARCH(p,</)  model  for  p 
<  2.  For  higher-order  GARCH  models,  Proc  Autoreg  imposes  the  constraints  that  ( I )  \\p. 
>  0,  1  <  k  <  max(</  -  l,p)  +  1  and  (2)  the  nonnegativity  of  the  in-sample  conditional 
variances;  see  the  SAS  9.1.3  Help  and  Documentation  manual.  Hence,  higher-order 
GARCH  models  estimated  by  Proc  Autoreg  with  the  Nelson-Cao  option  need  not  have 
nonnegative  conditional  variances  with  probability  one. 


'  Proc  Autoreg  of  SAS  has  the  option  of  imposing  the  Nelson-Cao  inequality  constraint  in 
the  GARCH  model,  hence  it  is  used  here. 


12.8  Another  Example:  The  Daily  USD/HKD  Exchange  Rates 


313 


Exhibit  12.35  AIC  Values  for  Various  Fitted  Models  for  the  Daily  Returns  of 
the  USD/HKD  Exchange  Rate 


AR  order 

VaMI-iuri 

order  (p) 

HHCn 

order  ( q ) 

AIC 

Stationarity 

0 

3 

1 

-1915.3 

nonstationary 

1 

1 

1 

-2054.3 

nonstationary 

1 

1 

2 

-2072.5 

nonstationary 

1 

1 

3 

-2051.0 

nonstationary 

1 

2 

1 

-2062.2 

nonstationary 

1 

2 

2 

-2070.5 

nonstationary 

1 

2 

3 

-2059.2 

nonstationary 

1 

3 

1 

-2070.9 

stationary 

1 

3 

2 

-2064.8 

stationary 

1 

3 

3 

-2062.8 

stationary 

1 

4 

1 

-2061.7 

nonstationary 

1 

4 

2 

-2054.8 

stationary 

1 

4 

3 

-2062.4 

stationary 

2 

3 

1 

-2066.6 

stationary 
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For  the  Hong  Kong  exchange  rate  data,  the  fitted  model  from  Proc  Autoreg  is  listed 
in  Exhibit  12.37  with  the  estimated  conditional  variances  shown  in  Exhibit  12.36.  Note 
that  the  GARCH2  (P2)  coefficient  estimate  is  negative. 


Exhibit  12.36  Estimated  Conditional  Variances  of  the  Daily  Returns  of 
USD/HKD  Exchange  Rate  from  the  Fitted 
AR(1)  +  GARCH(3,1)  Model 


Day 

>  plot (ts (usd . hkd$v, f req=l ) , type= 1 1 1 , xlab= ' Day ' , 
ylab= 1  Conditional  Variance') 

Since  both  the  intercept  and  the  ARCH  coefficient  are  positive,  we  can  apply  part 
(c)  of  Result  2  to  check  whether  or  not  the  conditional  variance  process  defined  by  the 
fitted  model  is  always  nonnegative.  The  characteristic  equation  1  -  Pjx  -  Pit2  -  P3.r*  =  0 
admits  three  roots  equal  to  1.153728  and  -0.483294+1.221474/.  Thus  =  1.153728 
and  |)i2|/^i  =  1-138579.  Based  on  numerical  computations,  n  in  Equation  (12.6.11)  turns 
out  to  be  2  and  Equation  (12.6.11)  has  one  real  root  equal  to  1.1385751  which  is  strictly 
less  than  1.138579  =  |X2|/^|.  Hence,  we  can  conclude  that  the  fitted  model  always 
results  in  nonnegative  conditional  variances. 
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Exhibit  12.37  Fitted  AR(1)  +  ARCH(3,1)  Model  for  Daily  Returns  of 
USD/HKD  Exchange  Rate 


Coefficient 

Estimate 

Std.  error 

f-ratio 

p-value 

AR1 

0.1635 

0.005892 

21.29 

0.0022 

ARCHO  (co) 

2.374xl0“5 

6.93xl0‘6 

3.42 

0.0006 

ARCH1  (ai) 

0.2521 

0.0277 

9.09 

<  0.0001 

GARCH1  (P-,) 

0.3066 

0.0637 

4.81 

<0.0001 

GARCH2  (p2) 

-0.09400 

0.0391 

-2.41 

0.0161 

GARCH3  (p3) 

0.5023 

0.0305 

16.50 

<  0.0001 

Outlier 

-0.1255 

0.00589 

-21.29 

<0.0001 

>  SAS  code : 

data  hkex;  infile 

' hkrate . dat 

'  ;  input 

hkrate ; 

outlierl=0 ; 

day+1;  if  day=203  then  outlierl=l; 
proc  autoreg  data=hkex; 

model  hkrate=outlierl  /noint  nlag=l  garch= (p=3 , q=l ) 
maxiter=200  archtest; 

/*hetero  outlier  /link=linear ; */ 
output  out=a  cev=v  residual=r; 

run; 


12.9  Summary 


This  chapter  began  with  a  brief  description  of  some  terms  and  issues  associated  with 
financial  time  series.  Autoregressive  conditional  heteroscedasticity  (ARCH)  models 
were  then  introduced  in  an  attempt  to  model  the  changing  variance  of  a  time  series.  The 
ARCH  model  of  order  1  was  thoroughly  explored  from  identification  through  parameter 
estimation  and  prediction.  These  models  were  then  generalized  to  the  generalized 
autoregressive  conditional  heteroscedasticity,  GARCH(/;,y),  model.  The  GARCH  mod¬ 
els  were  also  thoroughly  explored  with  respect  to  identification,  maximum  likelihood 
estimation,  prediction,  and  model  diagnostics.  Examples  with  both  simulated  and  real 
time  series  data  were  used  to  illustrate  the  ideas. 
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Exercises 


12.1  Display  the  time  sequence  plot  of  the  absolute  returns  for  the  CREF  data.  Repeat 
the  plot  with  the  squared  returns.  Comment  on  the  volatility  patterns  observed  in 
these  plots.  (The  data  are  in  file  named  CREF.) 

12.2  Plot  the  time  sequence  plot  of  the  absolute  returns  for  the  USD/HKD  exchange 
rate  data.  Repeat  the  plot  with  the  squared  returns.  Comment  on  the  volatility  pat¬ 
terns  observed  in  these  plots.  (The  data  are  in  the  file  named  usd.hkd.) 

12.3  Use  the  definition  r\t  =  r(  -  aV_  ,  [Equation  (12.2.4)  on  page  287]  and  show 
that  {rp}  is  a  serially  uncorrelated  sequence.  Show  also  that  r\t  is  uncorrelated 
with  past  squared  returns,  that  is,  show  that  Corr(r\„  r2,  _  /,)  =  0  for  k  >  0. 

12.4  Substituting  |  =  r;2  -  r|r  into  Equation  (12.2.2)  on  page  285  show  the  alge¬ 
bra  that  leads  to  Equation  (12.2.5)  on  page  287. 

12.5  Verify  Equation  (12.2.8)  on  page  288. 

12.6  Without  doing  any  theoretical  calculations,  order  the  kurtosis  values  of  the  fol¬ 
lowing  four  distributions  in  ascending  order:  the  /-distribution  with  10  DF,  the 
/-distribution  with  30  DF,  the  uniform  distribution  on  [-1,1],  and  the  normal  dis¬ 
tribution  with  mean  0  and  variance  4.  Explain  your  answer. 

12.7  Simulate  a  GARCH(1,1)  process  with  a  =  0.1  and  [3  =  0.8  and  of  length  500.  Plot 
the  time  series  and  inspect  its  sample  ACF,  PACF,  and  EACF.  Are  the  data  consis¬ 
tent  with  the  assumption  of  white  noise? 

(a)  Square  the  data  and  identify  a  GARCH  model  for  the  raw  data  based  on  the 
sample  ACF,  PACF,  and  EACF  of  the  squared  data. 

(b)  Identify  a  GARCH  model  for  the  raw  data  based  on  the  sample  ACF,  PACF 
and  EACF  of  the  absolute  data.  Discuss  and  reconcile  any  discrepancy 
between  the  tentative  model  identified  with  the  squared  data  and  that  with  the 
absolute  data. 

(c)  Perform  the  McFeod-Fi  test  on  your  simulated  series.  What  do  you  conclude? 

(d)  Repeat  the  exercise  but  now  using  only  the  first  200  simulated  data.  Discuss 
your  findings. 

12.8  The  file  cref.bond  contains  the  daily  price  of  the  CREF  bond  fund  from  August 
26,  2004  to  August,  15,  2006.  These  data  are  available  only  on  trading  days,  but 
proceed  to  analyze  the  data  as  if  they  were  sampled  regularly. 

(a)  Display  the  time  sequence  plot  of  the  daily  bond  price  data  and  comment  on 
the  main  features  in  the  data. 

(b)  Compute  the  daily  bond  returns  by  log-transforming  the  data  and  then  com¬ 
puting  the  first  differences  of  the  transformed  data.  Plot  the  daily  bond  returns, 
and  comment  on  the  result. 

(c)  Perform  the  McFeod-Fi  test  on  the  returns  series.  What  do  you  conclude? 

(d)  Show  that  the  returns  of  the  CREF  bond  price  series  appear  to  be  indepen¬ 
dently  and  identically  distributed  and  not  just  serially  uncorrelated;  that  is, 
there  is  no  discernible  volatility  clustering. 
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12.9  The  daily  returns  of  Google  stock  from  August  20,  2004  to  September  13,  2006 

are  stored  in  the  file  named  google. 

(a)  Display  the  time  sequence  plot  for  the  return  data  and  show  that  the  data  are 
essentially  uncorrelated  over  time. 

(b)  Compute  the  mean  of  the  Google  daily  returns.  Does  it  appear  to  be  signifi¬ 
cantly  different  from  0? 

(c)  Perform  the  McLeod-Li  test  on  the  Google  daily  returns  series.  What  do  you 
conclude? 

(d)  Identify  a  GARCH  model  for  the  Google  daily  return  data.  Estimate  the  iden¬ 
tified  model  and  perform  model  diagnostics  with  the  fitted  model. 

(e)  Draw  and  comment  on  the  time  sequence  plot  of  the  estimated  conditional 
variances. 

(f)  Plot  the  QQ  normal  plot  for  the  standardized  residuals  from  the  fitted  model. 
Do  the  residuals  appear  to  be  normal?  Discuss  the  effects  of  the  normality  on 
the  model  fit,  for  example,  regarding  the  computation  of  the  confidence  inter¬ 
val. 

(g)  Construct  a  95%  confidence  interval  for  bl. 

(h)  What  are  the  stationary  mean  and  variance  according  to  the  fitted  GARCH 
model?  Compare  them  with  those  of  the  data. 

(i)  Based  on  the  GARCH  model,  construct  the  95%  prediction  intervals  for 
/z-step-ahead  forecast,  for  h  =  1,  2,...,  5. 

12.10  In  Exercise  11.21  on  page  276,  we  investigated  the  existence  of  outliers  with  the 

logarithms  of  monthly  oil  prices  within  the  framework  of  an  IMA(1,1)  model. 

Here,  we  explore  the  effects  of  “outliers”  on  the  GARCH  specification.  The  data 

are  in  the  file  named  oil. price. 

(a)  Based  on  the  sample  ACF,  PACF,  and  EACF  of  the  absolute  and  squared 
residuals  from  the  fitted  IMA(1,1)  model  (without  outlier  adjustment),  show 
that  a  GARCH(1,1 )  model  may  be  appropriate  for  the  residuals. 

(b)  Fit  an  IMA(l.l)  +  GARCH(1,1)  model  to  the  logarithms  of  monthly  oil 
prices. 

(c)  Draw  the  time  sequence  plot  for  the  standardized  residuals  from  the  fitted 
IMA(1,1)  +  GARCH(l.l)  model.  Are  there  any  outliers? 

(d)  For  the  log  oil  prices,  fit  an  IMA(1,1)  model  with  two  IOs  at  t  =  2  and  t  =  56 
and  an  AO  at  t  =  8.  Show  that  the  residuals  from  the  IMA  plus  outlier  model 
appear  to  be  independently  and  identically  distributed  and  not  just  serially 
uncorrelated;  that  is,  there  is  no  discernible  volatility  clustering. 

(e)  Between  the  outlier  and  the  GARCH  model,  which  one  do  you  think  is  more 
appropriate  for  the  oil  price  data?  Explain  your  answer. 
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Appendix  I:  Formulas  for  the  Generalized  Portmanteau  Tests 


We  first  present  the  formula  for  Q  =  (7/,  j)  for  the  case  where  the  portmanteau  test  is 
based  on  the  squared  standardized  residuals.  Readers  may  consult  Li  and  Mak  (1994) 
for  proofs  of  the  formulas.  Let  9  denote  the  vector  of  GARCH  parameters.  For  example, 
for  a  GARCH(1,1)  model, 


co 


0  = 


a 


P 


(12.1.1) 


Write  the  i  th  component  of  9  as  9(-  so  that  0 1  =  0),  02  =  a,  and  03  =  p  for  the  GARCHf  1,1) 
model.  In  the  general  case,  let  k  =  p  +  q  +  1  be  the  number  of  GARCH  parameters.  Let  J 
be  an  m x k  matrix  whose  (cj')th  element  equals 
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(12.1.2) 


and  A  be  the  kxk  covariance  matrix  of  the  approximate  normal  distribution  of  the  maxi¬ 
mum  likelihood  estimator  of  0  for  the  model  assuming  normal  innovations;  see 
Section  12.4.  Let  Q  =  (ql ; )  he  the  matrix  of  the  q’ s  appearing  in  the  quadratic  form  of 
the  generalized  portmanteau  test.  It  can  be  shown  that  the  matrix  Q  equals 
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T 

where  I  is  the  m  x  m  identity  matrix,  k  is  the  (excess)  kurtosis  of  the  innovations,  J  is 
the  transpose  of  J,  and  the  superscript  -1  denotes  the  matrix  inverse. 

Next,  we  present  the  formulas  for  the  case  where  the  tests  are  computed  based  on 
the  absolute  standardized  residuals.  In  this  case,  the  (i,j  )th  element  of  the  J  matrix 
equals 

l  oa .  | .  j 

T - (12J-4) 

t\t-\  j 

where  t  =  £(|sf|),  and  Q  equals 


-v 


with  v  =  £(|e^|)  . 
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Chapter  13 

Introduction  to  Spectral  Analysis 


Historically,  spectral  analysis  began  with  the  search  for  “hidden  periodicities”  in  time 
series  data.  Chapter  3  discussed  fitting  cosine  trends  at  various  known  frequencies  to 
series  with  strong  cyclical  trends.  In  addition,  the  random  cosine  wave  example  in 
Chapter  2  on  page  18,  showed  that  it  is  possible  for  a  stationary  process  to  look  very 
much  like  a  deterministic  cosine  wave.  We  hinted  in  Chapter  3  that  by  using  enough  dif¬ 
ferent  frequencies  with  enough  different  amplitudes  (and  phases)  we  might  be  able  to 
model  nearly  any  stationary  series. '  This  chapter  pursues  those  ideas  further  with  an 
introduction  to  spectral  analysis.  Previous  to  this  chapter,  we  concentrated  on  analyzing 
the  correlation  properties  of  time  series.  Such  analysis  is  often  called  time  domain  anal¬ 
ysis.  When  we  analyze  frequency  properties  of  time  series,  we  say  that  we  are  working 
in  th &  frequency  domain.  Frequency  domain  analysis  or  spectral  analysis  has  been 
found  to  be  especially  useful  in  acoustics,  communications  engineering,  geophysical 
science,  and  biomedical  science,  for  example. 

13.1  Introduction 


Recall  from  Chapter  3  the  cosine  curve  with  equation^ 

Rcos(2nft  +  4>)  (13.1.1) 

Remember  that  R  (>  0)  is  the  amplitude,  f  the  frequency,  and  4>  the  phase  of  the  curve. 
Since  the  curve  repeats  itself  exactly  every  l//time  units,  I  //'  is  called  the  period  of  the 
cosine  wave. 

Exhibit  13.1  displays  two  discrete-time  cosine  curves  with  time  running  from  1  to 
96.  We  would  only  see  the  discrete  points,  but  the  connecting  line  segments  are  added  to 
help  our  eyes  follow  the  pattern.  The  frequencies  are  4/96  and  14/96,  respectively.  The 
lower-frequency  curve  has  a  phase  of  zero,  but  the  higher-frequency  curve  is  shifted  by  a 
phase  of  0.6n. 

Exhibit  13.2  shows  the  graph  of  a  linear  combination  of  the  two  cosine  curves  with 
a  multiplier  of  2  on  the  low-frequency  curve  and  a  multiplier  of  3  on  the  higher-fre¬ 
quency  curve  and  a  phase  of  0.6k:  that  is, 


1  See  Exercise  2.25  on  page  23,  in  particular. 

*  In  this  chapter,  we  use  notation  slightly  different  from  that  in  Chapter  3. 
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Exhibit  13.1  Cosine  Curves  with  n  =  96  and  Two  Frequencies  and  Phases 


0  20  40  60  80 


t 

>  win . graph (width=4 . 875 , height =2 . 5 , pointsize=8 ) 

>  t  =  1  :  9 6  ;  cosl  =  cos ( 2 *pi* t *4 / 96 )  ;  cos2  =  cos (2 *pi* ( t*14/96+ . 3 ) ) 

>  plot(t,cosl,  type='o',  ylab= ' Cosines ' ) 

>  lines (t , cos2 , lty= ' dotted ' , type= 'o', pch=4 ) 

Yt  =  2cos(2tU^)  +  3 cos  2ji(f^  +  0.3J  (13.1.2) 

Now  the  periodicity  is  somewhat  hidden.  Spectral  analysis  provides  tools  for  dis¬ 
covering  the  "hidden”  periodicities  quite  easily.  Of  course,  there  is  nothing  random  in 
this  time  series. 


>  y=2*cosl+3*cos2 ;  plot (t , y, type= ' o ' , ylab=expression (y [t] ) ) 
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As  we  saw  earlier,  Equation  (13.1.1)  is  not  convenient  for  estimation  because  the 
parameters  R  and  <1>  do  not  enter  the  expression  linearly.  Instead,  we  use  a  trigonometric 
identity  to  reparameterize  Equation  (13.1.1)  as 

Rcos(2nft  +  O)  =  Acos(2it/f)  +  5sin(2jt/f)  (13.1.3) 

where 

R  =  Ja2  +  B2  ,  O  =  atan(-B/A)  (13.1.4) 

and,  conversely, 

A  =  Rcos((t>),  B  =  -/?sin(ct>)  (13.1.5) 

Then,  for  a  fixed  frequency/,  we  can  use  cos(2it ft)  and  sin(27t/f)  as  predictor  variables 
and  fit  the  A’s  and  B's  from  the  data  using  ordinary  least  squares  regression. 

A  general  linear  combination  of  m  cosine  curves  with  arbitrary  amplitudes,  fre¬ 
quencies,  and  phases  could  be  written  as  ' 

m 

Y,  =  A0+  X  [Ajcos(2nfjt)  +  BjSm(2nfjt)]  (13.1.6) 

7=1 

Ordinary  least  squares  regression  can  be  used  to  fit  the  A’s  and  B's,  but  when  the 
frequencies  of  interest  are  of  a  special  form,  the  regressions  are  especially  easy.  Suppose 
that  n  is  odd  and  write  n  =  2k  +  1.  Then  the  frequencies  of  the  form  1  In,  2 In,...,  kin 
(=  1/2  -  I  l(2n))  are  called  the  Fourier  frequencies.  The  cosine  and  sine  predictor  vari¬ 
ables  at  these  frequencies  (and  at /=  0)  are  known  to  be  orthogonal,^  and  the  least 
squares  estimates  are  simply 

4  =  Y  (13.1.7) 

r\  72  'J  Tl 

4-  =  -  V  Y  cos(2ntj/n)  and  B-  =  -  V  Y  sm(2ntj/n)  (13.1.8) 

j  n  c—i  ‘  j  n 

t- 1  i=l 

If  the  sample  size  is  even,  say  n  =  2k,  Equations  (13.1.7)  and  (13.1.8)  still  apply  for 
j  =  1,2 k-  1 ,  but 

4  =  -  £  (-1  )rYt  and  Bk  =  0  (13.1.9) 

nt=  1 

Note  that  here/.  =  k/n  =  Vi. 

If  we  were  to  apply  these  formulas  to  the  series  shown  in  Exhibit  13.2,  we  would 

A  A 

obtain  perfect  results.  That  is,  at  frequency /4  =  4/96,  we  obtain  A4  =  2  and  B  4  =  0,  and 
at  frequency /]4  =  14/96,  we  obtain  414  =  -0.927051  and  Z?14  =  -2.85317.  We  would 
obtain  estimates  of  zero  for  the  regression  coefficients  at  all  other  frequencies.  These 


:  The  Aq  term  can  be  thought  of  as  the  coefficient  of  the  cosine  curve  at  zero  frequency, 
which  is  identically  one.  and  the  B0  can  be  thought  of  as  the  coefficient  on  the  sine  curve  at 
frequency  zero,  which  is  identically  zero  and  hence  does  not  appear. 

See  Appendix  J  on  page  349  for  more  information  on  the  orthogonality  properties  of  the 
cosines  and  sines. 
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results  obtain  because  there  is  no  randomness  in  this  series  and  the  cosine-sine  fits  are 
exact. 

Note  also  that  any  series  of  any  length  n,  whether  deterministic  or  stochastic  and 
with  or  without  any  true  periodicities,  can  be  fit  perfectly  by  the  model  in  Equation 
(13.1.6)  by  choosing  m  =  nl 2  if  n  is  even  and  m  =  (n  -  l)/2  if  n  is  odd.  There  are  then  n 
parameters  to  adjust  (estimate)  to  fit  the  series  of  length  n. 

13.2  The  Periodogram 


For  odd  sample  sizes  with  n  =  2k  +  1,  the  periodogram  I  at  frequency  /  =  jin  for  /  =  I . 
2,...,  k,  is  defined  to  be 

<'32-d 

If  the  sample  size  is  even  and  n  =  2k,  Equations  (13.1.7)  and  (13.1.8)  still  give  the  A’s 
and  B’s  and  Equation  (13.2.1)  gives  the  periodogram  for  j  =  1,  2,...,  k  -  1.  However,  at 
the  extreme  frequency/=  kin  =  Vi,  Equations  (13.1.9)  apply  and 

7(i)  =  n(Ak)2  (13.2.2) 

Since  the  periodogram  is  proportional  to  the  sum  of  squares  of  the  regression  coeffi¬ 
cients  associated  with  frequency  f=j/n,  the  height  of  the  periodogram  shows  the  relative 
strength  of  cosine-sine  pairs  at  various  frequencies  in  the  overall  behavior  of  the  series. 
Another  interpretation  is  in  terms  of  an  analysis  of  variance.  The  periodogram  Kj/n)  is 
the  sum  of  squares  with  two  degrees  of  freedom  associated  with  the  coefficient  pair 
( Aj.Bj )  at  frequency  jin,  so  we  have 

t(Yj-?)2  =  i70  (13-2-3) 

7=1  7=1 

when  n  =  27+1  is  odd.  A  similar  result  holds  when  n  is  even  but  there  is  a  further  term 
in  the  sum,  IQ/ 2),  with  one  degree  of  freedom. 

For  long  series,  the  computation  of  a  large  number  of  regression  coefficients  might 
be  intensive.  Fortunately,  quick,  efficient  numerical  methods  based  on  the  fast  Fourier 
transform  (FFT)  have  been  developed  that  make  the  computations  feasible  for  very  long 
time  series.  * 

Exhibit  13.3  displays  a  graph  of  the  periodogram  for  the  time  series  in  Exhibit  13.2. 
The  heights  show  the  presence  and  relative  strengths  of  the  two  cosine-sine  components 
quite  clearly.  Note  also  that  the  frequencies  4/96  «  0.04167  and  14/96  «  0.14583  have 
been  marked  on  the  frequency  axis. 


Often  based  on  the  Cooley-Tukey  FFT  algorithm;  see  Gentleman  and  Sande  (1966). 
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Exhibit  13.3  Periodogram  of  the  Series  in  Exhibit  13.2 


H 


°  H 


o 


0.0  0.04167  0.1  0.14583  0.2  0.3  0.4  0.5 


Frequency 

>  periodogram (y) ;  abline(h=0);  axis ( 1 , at=c ( 0 . 04167 ,. 14583 ) ) 

Does  the  periodogram  work  just  as  well  when  we  do  not  know  where  or  even  if 
there  are  cosines  in  the  series?  What  if  the  series  contains  additional  “noise”?  To  illus¬ 
trate,  we  generate  a  time  series  using  randomness  to  select  the  frequencies,  amplitudes, 
and  phases  and  with  additional  additive  white  noise.  The  two  frequencies  are  randomly 
chosen  without  replacement  from  among  1/96,  2/96,...,  47/96.  The  A’s  and  ZTs  are 
selected  independently  from  normal  distributions  with  means  of  zero  and  standard  devi¬ 
ations  of  2  for  the  first  component  and  3  for  the  second.  Finally,  a  normal  white  noise 
series,  {Wt},  with  zero  mean  and  standard  deviation  1,  is  chosen  independently  of  the 
A’s  and  B' s  and  added  on.  The  model  is1 

Yt  =  A1cos(2ii/1r)  +  B^  sin(2jt/jr)  +  A2cos(2nf2t)  +  B2sm(2nf2t)  +  Wf  (13.2.4) 

and  Exhibit  13.4  displays  a  time  series  of  length  96  simulated  from  this  model.  Once 
more,  the  periodicities  are  not  obvious  until  we  view  the  periodogram  shown  in  Exhibit 
13.5. 


'  This  model  is  often  described  as  a  signal  plus  noise  model.  The  signal  could  be  determinis¬ 
tic  (with  unknown  parameters)  or  stochastic. 
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Exhibit  13.4  Time  Series  with  “Hidden”  Periodicities 


t 

>  win . graph (width=4 . 875 , height =2 . 5 , pointsize=8 ) 

>  set . seed (134) ;  t=l:96;  integer=sample (48 , 2) 

>  freql  =  integer [1] /96 ;  freq2  =  integer  [2] /96 

>  Al=rnorm ( 1 , 0 , 2 ) ;  Bl=rnorm ( 1 , 0 , 2 ) 

>  A2=rnorm ( 1 , 0 , 3 ) ;  B2=rnorm ( 1 , 0 , 3 ) ;  w=2*pi*t 

>  y=Al*cos (w*freql) +Bl*sin (w*freql) +A2*cos (w*freq2)+ 

B2*sin (w*f req2 ) +rnorm (96,0,1) 

>  plot (t , y, type= 'o', ylab=expression (y [t] ) ) 


The  periodogram  clearly  shows  that  the  series  contains  two  cosine-sine  pairs  at  fre¬ 
quencies  of  about  0.1 1  and  0.32  and  that  the  higher-frequency  component  is  much  stron¬ 
ger.  There  are  some  other  very  small  spikes  in  the  periodogram,  apparently  caused  by 
the  additive  white  noise  component.  (When  we  checked  the  simulation  in  detail,  we 
found  that  one  frequency  was  chosen  as  10/96  »  0.1042  and  the  other  was  selected  as 
30/96  =  0.3125.) 


Exhibit  13.5  Periodogram  of  the  Time  Series  Shown  in  Exhibit  13.4 
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Frequency 


>  periodogram (y) ; abline (h=0 ) 
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Here  is  an  example  of  the  periodogram  for  a  classic  time  series  from  Whittaker  and 
Robinson  (1924). +  Exhibit  13.6  displays  the  time  series  plot  of  the  brightness  (magni¬ 
tude)  of  a  particular  star  at  midnight  on  600  consecutive  nights. 


Exhibit  13.6  Variable  Star  Brightness  on  600  Consecutive  Nights 


Day 

>  data (star) 

>  plot ( star , xlab= ' Day ' , ylab= 1  Brightness ' ) 


Exhibit  13.7  shows  the  periodogram  for  this  time  series.  There  are  two  very  promi¬ 
nent  peaks  in  the  periodogram.  When  we  inspect  the  actual  numerical  values,  we  find 
that  the  larger  peak  occurs  at  frequency/=  21/600  =  0.035.  This  frequency  corresponds 
to  a  period  of  600/21  «  28.57,  or  nearly  29  days.  The  secondary  peak  occurs  at/  = 
25/600  «  0.04167,  which  corresponds  to  a  period  of  24  days.  The  much  more  modest 
nonzero  periodogram  values  near  the  major  peak  are  likely  caused  by  leakage. 

The  two  sharp  peaks  suggest  a  model  for  this  series  with  just  two  cosine-sine  pairs 
with  the  appropriate  frequencies  or  periods,  namely 

Yt  =  P(j  +  Picos(2ti/i0  +  P0 s i n (2 rr/j t)  +  P3cos(2jt/,r)  +  P4sin(2 nf2t)  +  ef  (13.2.5) 

where/]  =  1/29  and/2  =  1/24.  If  we  estimate  this  regression  model  as  in  Chapter  3,  we 
obtain  highly  statistically  significant  regression  coefficients  for  all  five  parameters  and  a 
multiple  R-square  value  of  99.9%. 

We  will  return  to  this  time  series  in  Section  14.5  on  page  358,  where  we  discuss 
more  about  leakage  and  tapering. 


'  An  extensive  analysis  of  this  series  appears  throughout  Bloomfield  (2000). 
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Exhibit  13.7  Periodogram  of  the  Variable  Star  Brightness  Time  Series 
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Frequency 

>  periodogram ( star , ylab= ' Variable  Star  Periodogram '); abline (h=0 ) 

Although  the  Fourier  frequencies  are  special,  we  extend  the  definition  of  the  peri¬ 
odogram  to  all  frequencies  in  the  interval  0  to  V2  through  the  Equations  (13.1.8)  and 


(13.2.1).  Thus  we  have  for  0  </<  Vi 

Of)  = 

(13.2.6) 

where 

A  2  " 

A  2  " 

Af  =  ~  Z  Ytcos(2ntf) 

t  =  l 

and  Bj-  =  -  Ytsu\(2ntf) 

nt  =  1 

(13.2.7) 

When  viewed  in  this  way,  the  periodogram  is  often  calculated  at  a  grid  of  frequencies 
finer  than  the  Fourier  frequencies,  and  the  plotted  points  are  connected  by  line  segments 
to  display  a  somewhat  smooth  curve. 

Why  do  we  only  consider  positive  frequencies?  Because  by  the  even  and  odd  nature 
of  cosines  and  sines,  any  cosine-sine  curve  with  negative  frequency,  say  -/,  could  just  as 
well  be  expressed  as  a  cosine-sine  curve  with  frequency  +/.  No  generality  is  lost  by 
using  positive  frequencies.^ 

Secondly,  why  do  we  restrict  frequencies  to  the  interval  from  0  to  Vil  Consider  the 
graph  shown  in  Exhibit  13.8.  Here  we  have  plotted  two  cosine  curves,  one  with  fre¬ 
quency /=  lA  and  the  one  shown  with  dashed  lines  at  frequency /=  3A.  If  we  only 
observe  the  series  at  the  discrete-time  points  0,  1,  2,  3,...,  the  two  series  are  identical. 
With  discrete-time  observations,  we  could  never  distinguish  between  these  two  curves. 
We  say  that  the  two  frequencies  lA  and  3A  are  aliased  with  one  another.  In  general,  each 
frequency /within  the  interval  0  to  V2  will  be  aliased  with  each  frequency  of  the  form 


'  The  definition  of  Equation  (13.2.6)  is  often  used  for  —V2  <f<  +V2,  but  the  resulting  function 
is  symmetric  about  zero  and  no  new  information  is  gained  from  the  negative  frequencies. 
Later  in  this  chapter,  we  will  use  both  positive  and  negative  frequencies  so  that  certain  nice 
mathematical  relationships  hold. 
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/+  k(}/i)  for  any  positive  integer  k ,  and  it  suffices  to  limit  attention  to  frequencies  within 
the  interval  from  0  to  Vi. 


Exhibit  13.8  Illustration  of  Aliasing 


012345678 

Discrete  Time  t 

>  win . graph (width=4 . 875 ,  height=2 . 5 , pointsize=8 ) 

>  t=seq ( 0 , 8 , by= . 05) 

>  plot (t , cos (2*pi*t/4) , axes  =  F, type= 1 1 ' , ylab=expression (Y [t]  )  , 

xlab= ' Discrete  Time  t ' ) 

>  axis (1 , at=c (1 , 2 , 3 , 4 , 5 , 6 , 7) ) ; axis (1) ;  axis(2);  box ( ) 

>  lines (t , cos (2*pi*t*3/4 ), lty= ' dashed 1 , type= 1 1 ')  ;  abline(h=0) 

>  points (x=c (0 : 8) ,y=cos (2*pi*c (0:8) /4 ) , pch=19) 


13.3  The  Spectral  Representation  and  Spectral  Distribution 


Consider  a  time  series  represented  as 

m 

Yt=  £  [Ajcosdnfjt)  +  Bjsmdnfjt)]  (13.3.1) 

7=1 

where  the  frequencies  0  <f\  <  /)  <...</„,  <  Vi  are  fixed  and  Aj  and  Bj  are  independent 
normal  random  variables  with  zero  means  and  Var(Aj)  =  Var(Bj)  =  <jJ  .  Then  a  straight¬ 
forward  calculation  shows  that  { Y,}  is  stationary1^  with  mean  zero  and 

m 

Yk  =  £  <jjcos(2nkfj)  (13.3.2) 

7=1 

In  particular,  the  process  variance,  y0,  is  a  sum  of  the  variances  due  to  each  component 
at  the  various  fixed  frequencies: 


^  Compare  this  with  Exercise  2.29  on  page  24. 
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m 

Y0  =  S  of  (13.3.3) 

7=1 

If  for  0  </<  Vi  we  define  two  random  step  functions  by 

a(f)  =  X  Aj  and  b(f)  =  £  Bj  (13.3.4) 

{J\fj  -/} 

then  we  can  write  Equation  (13.3.1)  as 

‘/2  >/2 

F  =  [  cos  (2n  ft)  da  (f)  +  [  sit\(2nft)db(j)  (13.3.5) 

J0  '  J0 

It  turns  out  that  any  zero-mean  stationary  process  may  be  represented  as  in  Equation 
(13.3.5).^  It  shows  how  stationary  processes  may  be  represented  as  linear  combinations 
of  infinitely  many  cosine-sine  pairs  over  a  continuous  frequency  band.  In  general,  a(f) 
and  b(f)  are  zero-mean  stochastic  processes  indexed  by  frequency  on  0  </<  Vi,  each 
with  uncorrelated^  increments,  and  the  increments  of  a(f)  are  uncorrelated  with  the 
increments  of  b(f).  Furthermore,  we  have 

Var(jWf))  =  Va,(f  2db(f ))  =  F(f2)  -  F(f{),  say.  (13.3.6) 

J  1  ^  1 

Equation  (13.3.5)  is  called  the  spectral  representation  of  the  process.  The  nondecreas¬ 
ing  function  F(f)  defined  on  0  </<  Vi  is  called  the  spectral  distribution  function  of  the 
process. 

We  say  that  the  special  process  defined  by  Equation  (13.3.1)  has  a  purely  discrete 
(or  line )  spectrum  and,  for  0  </<  Vi, 

F{f)  =  I  of  (13.3.7) 

O' I  fj*f} 

Here  the  heights  of  the  jumps  in  the  spectral  distribution  give  the  variances  associated 
with  the  various  periodic  components,  and  the  positions  of  the  jumps  indicate  the  fre¬ 
quencies  of  the  periodic  components. 

In  general,  a  spectral  distribution  function  has  the  properties 


'  The  proof  is  beyond  the  scope  of  this  book.  See  Cramer  and  Leadbetter  (1967,  pp.  128 
-138),  for  example.  You  do  not  need  to  understand  stochastic  Riemann-Stieltjes  integrals  to 
appreciate  the  rest  of  the  discussion  of  spectral  analysis. 

1  Uncorrelated  increments  are  usually  called  orthogonal  increments. 
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1 .  F  is  nondecreasing 

2.  F  is  right  continuous 

3.  F(f)  >  0  for  all/ 

4.  lim  F(j)  =  Var{Y.) 

f  >  t/2 

If  we  consider  the  scaled  spectral  distribution  function  F(f)/y0 ,  we  have  a  function  with 
the  same  mathematical  properties  as  a  cumulative  distribution  function  (CDF)  for  a  ran¬ 
dom  variable  on  the  interval  0  to  Vi  since  now  F(l/i)l'{{)  =  1 . 

We  interpret  the  spectral  distribution  by  saying  that,  for  0  </  </2  <  Vi,  the  integral 

f  f2dF(f)  (13.3.9) 

h 

gives  the  portion  of  the  (total)  process  variance  F(lA)  =  y0  that  is  attributable  to  frequen¬ 
cies  in  the  range  /  to  f2. 

Sample  Spectral  Density 

In  spectral  analysis,  it  is  customary  to  first  remove  the  sample  mean  from  the  series.  For 
the  remainder  of  this  chapter,  we  assume  that  in  the  definition  of  the  periodogram,  Yt 
represents  deviations  from  its  sample  mean.  Furthermore,  for  mathematical  conve¬ 
nience,  we  now  let  various  functions  of  frequency,  such  as  the  periodogram,  be  defined 
on  the  interval  (-‘A, ‘A],  In  particular,  we  define  the  sample  spectral  density  or  sample 
spectrum  as  §(f)  =  Vilif)  for  all  frequencies  in  (-lA,lA)  and  §(W)  =  I(Vi).  Using  straight¬ 
forward  but  somewhat  tedious  algebra,  we  can  show  that  the  sample  spectral  density  can 
also  be  expressed  as 

S(f)  =  Y0  +  2£  jkcos(2nfk)  (13.3.10) 

k  =  l 

where  y ^ is  the  sample  or  estimated  covariance  function  at  lag  k  (k  =  0,  1,  2 1) 
given  by 

h  =  \  (13.3.11) 

t  =  k  +  1 

In  Fourier  analysis  terms,  the  sample  spectral  density  is  the  (discrete-time)  Fourier 
transform  of  the  sample  covariance  function.  From  Fourier  analysis  theory,  it  follows 
that  there  is  an  inverse  relationship,  namely  ^ 

yk  =  J  "  S(f)cos(2nfk)df  (13.3.12) 


+  This  may  be  proved  using  the  orthogonality  relationships  shown  in  Appendix  J  on 
page  349. 
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In  particular,  notice  that  the  total  area  under  the  sample  spectral  density  is  the  sample 
variance  of  the  time  series. 

Y0  =  f  kf)df  =  ~  f  (Y,-Y)2  (13.3.13) 

J/2  =  1 

Since  each  can  be  obtained  from  the  other,  the  sample  spectral  density  and  the  sample 
covariance  function  contain  the  same  information  about  the  observed  time  series  but  it  is 
expressed  in  different  ways.  For  some  purposes,  one  is  more  convenient  or  useful,  and 
for  other  purposes  the  other  is  more  convenient  or  useful. 

13.4  The  Spectral  Density 


For  many  processes,  such  as  all  stationary  ARMA  processes,  the  covariance  functions 
decay  rapidly  with  increasing  lag. '  When  that  is  the  case,  it  seems  reasonable  to  con¬ 
sider  the  expression  formed  by  replacing  sample  quantities  in  the  sample  spectral  den¬ 
sity  of  Equation  (13.3. 10)  with  the  corresponding  theoretical  quantities.  To  be  precise,  if 
the  covariance  function  yk  is  absolutely  summable,  we  define  the  theoretical  (or  popula¬ 
tion)  spectral  density  for  -Vi  </<  Vi  as 

00 

S(f)  =  y0  +  2  £  ykcos(2nfk)  (13.4.1) 

k  =  1 

Once  more,  there  is  an  inverse  relationship,  given  by 

Yk  =  \  '2  S(/) cos ( 2 nfk) df  (13.4.2) 

-Vi 

Mathematically,  S(f)  is  the  (discrete-time)  Fourier  transform  of  the  sequence  ...,y_2  Y-i, 
Yq,  Yi,  y2,...,  and  {y^.}  is  the  inverse  Fourier  transform4'  of  the  spectral  density  S(f) 
defined  on  —Vi  </<  Vi. 

A  spectral  density  has  all  of  the  mathematical  properties  of  a  probability  density 
function  on  the  interval  (-‘A, ‘A],  with  the  exception  that  the  total  area  is  y()  rather  than  1. 
Moreover,  it  can  be  shown  that 


1  Of  course,  this  is  not  the  case  for  the  processes  defined  in  Equations  (13.2.4)  on  page  323 
and  (13.3.1)  on  page  327.  Those  processes  have  discrete  components  in  their  spectra. 
Notice  that  since  yk  =  y_k  and  the  cosine  function  is  also  even,  we  could  write 

00 

S(f)  =  I  yke~2  nikf 

k  =  -oo 

where  i  =  J-\  is  the  imaginary  unit  for  complex  numbers.  This  looks  more  like  a  standard 
discrete-time  Fourier  transform.  In  a  similar  way.  Equation  (13.4.2)  may  be  rewritten  as 

-  f1/2 
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,/ 

F(f)  =  f  S(x)dx  for  0  </<  Vi  (13.4.3) 

J  o 

Thus,  twice  the  area  under  the  spectral  density  between  frequencies/!  and/2  with  0</i 
<fl  <  V2  is  interpreted  as  the  portion  of  the  variance  of  the  process  that  is  attributable  to 
cosine-sine  pairs  in  that  frequency  interval  that  compose  the  process. 


Time-Invariant  Linear  Filters 


A  time-invariant  linear  filter  is  defined  by  a  sequence  of  absolutely  summable  constants 
...,  c_i,  Cq,  cq,  c 2,  C3,...  .  If  {Xt}  is  a  time  series,  we  use  these  constants  to  filter  {Xt}  and 
produce  a  new  time  series  {Yt}  using  the  expression 

00 

r,  =  I  cjXt_j  (13.4.4) 

3  = -CO 


If  Ck  =  0  for  k  <  0,  we  say  that  the  filter  is  causal.  In  this  case,  the  filtering  at  time  t 
involves  only  present  and  past  data  values  and  can  be  carried  out  in  “real  time.” 

We  have  already  seen  many  examples  of  time-invariant  linear  filters  in  previous 
chapters.  Differencing  (nonseasonal  or  seasonal)  is  an  example.  A  combination  of  one 
seasonal  difference  with  one  nonseasonal  difference  is  another  example.  Any  moving 
average  process  can  be  considered  as  a  linear  filtering  of  a  white  noise  sequence  and  in 
fact  every  general  linear  process  defined  by  Equation  (4. 1 . 1 )  on  page  55  is  a  linear  filter¬ 
ing  of  white  noise. 

The  expression  on  the  right-hand  side  of  Equation  (13.4.4)  is  frequently  called  the 
(discrete-time)  convolution  of  the  two  sequences  {cf}  and  {Xf}.  An  extremely  useful 
property  of  Fourier  transforms  is  that  the  somewhat  complicated  operation  of  convolu¬ 
tion  in  the  time  domain  is  transformed  into  the  very  simple  operation  of  multiplication 
in  the  frequency  domain. ' 

In  particular,  let  S%(f)  be  the  spectral  density  for  the  { Xt }  process  and  let  Sy(f)  be  the 
spectral  density  for  the  { Yt)  process.  In  addition,  let 


Then 


00 

C(e~2n'f)  =  ^  Cje-2nifj 


]  =  -0° 


Cov(Yp  Yt_k)  =  Cov 


X  CjXt-f  H  csXt-k-s 
\J  =  —00  s  =  —00 


(13.4.5) 


00  00 

=  Z  Z  CjCsCov(Xt-j’  Xt-k-s) 

j  =  —00  s  =  —00 


'  You  may  have  already  seen  this  with  moment-generating  functions.  The  density  of  the  sum 
of  two  independent  random  variables,  discrete  or  continuous,  is  the  convolution  of  their 
respective  densities,  but  the  moment-generating  function  for  the  sum  is  the  product  of  their 
respective  moment-generating  functions. 
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00  00  !/2 

=  Z  Z  C/'J,  e2ni{s  +  k-j)fSx(f)df 

j  =  —oo  s  =  —oo 

2 

Vi  00 

=  J  £  c ^e~2nisf  eh *ifkSx(j)df 

-Vl  s  =  -  oo 

So 

Cov(YpYt_k )  =  f "  |C(e-2^/)|25z(/)e2lt^# 

-‘/2 

But 

Cov(yp  Yf_k)  =  f‘/2  SY(f)elKifkdf 

-Vi 

so  we  must  have 

Sy(/)  =  |c(e-2lt^)|2sxc/) 

This  expression  is  invaluable  for  investigating  the  effect  of  time-invariant  linear  filters 
on  spectra.  In  particular,  it  helps  us  find  the  form  of  the  spectral  densities  for  ARMA 
processes.  The  function  C(e-27U'^)|2  is  often  called  the  (power)  transfer  function  of  the 
filter. 

13.5  Spectral  Densities  for  ARMA  Processes 

White  Noise 

From  Equation  (13.4.1),  it  is  easy  to  see  that  the  theoretical  spectral  density  for  a  white 
noise  process  is  constant  for  all  frequencies  in  -Vi  </<  Vi  and,  in  particular, 

S(f)  =  a2  (13.5.1) 

All  frequencies  receive  equal  weight  in  the  spectral  representation  of  white  noise.  This 
is  directly  analogous  to  the  spectrum  of  white  light  in  physics — all  colors  (that  is,  all 
frequencies)  enter  equally  in  white  light.  Finally,  we  understand  the  origin  of  the  name 
white  noise ! 

MA(1)  Spectral  Density 

An  MA(1)  process  is  a  simple  filtering  of  white  noise  with  cq  =  1  and  rq  =  -8  and  so 

\C(e-2nif)\2  =  (1  -6<?27t'/)(l -Qe-2nif) 

=  l+e2-Q(e2nif+e~2nif)  (13.5.2) 

=  1  +  92  -  29cos(2jt/) 


(13.4.6) 

(13.4.7) 

(13.4.8) 


Thus 
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S(f)  =  [  1  +  02  -  20cos(2ji/)]ct^  (13.5.3) 

When  0  >  0,  you  can  show  that  this  spectral  density  is  an  increasing  function  of  nonneg¬ 
ative  frequency,  while  for  0  <  0  the  function  decreases. 

Exhibit  13.9  displays  the  spectral  density  for  an  MA(1)  process  with  0  =  0.9. '  Since 
spectral  densities  are  symmetric  about  zero  frequency,  we  will  only  plot  them  for  posi¬ 
tive  frequencies.  Recall  that  this  MA(1)  process  has  a  relatively  large  negative  correla¬ 
tion  at  lag  1  but  all  other  correlations  are  zero.  This  is  reflected  in  the  spectrum.  We  see 
that  the  density  is  much  stronger  for  higher  frequencies  than  for  low  frequencies.  The 
process  has  a  tendency  to  oscillate  back  and  forth  across  its  mean  level.  This  rapid  oscil¬ 
lation  is  high-frequency  behavior.  We  might  say  that  the  moving  average  suppresses  the 
lower-frequency  components  of  the  white  noise  process.  Researchers  sometimes  refer  to 
this  type  of  spectrum  as  a  blue  spectrum  since  it  emphasizes  the  higher  frequencies  (that 
is,  those  with  lower  period  or  wavelength),  which  correspond  to  blue  light  in  the  spec¬ 
trum  of  visible  light. 


Exhibit  13.9  Spectral  Density  of  MA(1)  Process  with  0  =  0.9 


>  win . graph (width=4 . 875 , height =2 . 5 , point size =8 ) 

>  theta= . 9  #  Reset  theta  for  other  MA(1)  plots 

>  ARMAspec (model=list (ma=-theta) ) 


Exhibit  13.10  displays  the  spectral  density  for  an  MA(1)  process  with  0  =  -0.9. 
This  process  has  positive  correlation  at  lag  1  with  all  other  correlations  zero.  Such  a  pro¬ 
cess  will  tend  to  change  slowly  from  one  time  instance  to  the  next.  This  is  low-fre¬ 
quency  behavior  and  is  reflected  in  the  shape  of  the  spectrum.  The  density  is  much 
stronger  for  lower  frequencies  than  for  high  frequencies.  Researchers  sometimes  call 
this  a  red  spectrum. 


^  In  all  of  the  plots  of  ARMA  spectral  densities  that  follow  in  this  section,  we  take  a~  =  1. 
This  only  affects  the  vertical  scale  of  the  graphs,  not  their  shape. 
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Exhibit  13.10  Spectral  Density  of  MA(1)  Process  with  0  =  -0.9 


MA(2)  Spectral  Density 

The  spectral  density  for  an  MA(2)  model  may  be  obtained  similarly.  The  algebra  is  a  lit¬ 
tle  longer,  but  the  final  expression  is 

S(f)  =  [1  +0^  +  01-20^1  -02)cos(2ji./)-202cos(4ji./)]o2  (13.5.4) 

Exhibit  13.11  shows  a  graph  of  such  a  density  when  0]  =  1  and  02  =  -0.6.  The  frequen¬ 
cies  between  about  0.1  and  0.18  have  especially  small  density  and  there  is  very  little 
density  below  the  frequency  of  0. 1 .  Higher  frequencies  enter  into  the  picture  gradually, 
with  the  strongest  periodic  components  at  the  highest  frequencies. 


Exhibit  13.11  Spectral  Density  of  MA(2)  Process  with  0!  =  1  and  02  =  -0.6 


>  thetal=l;  theta2=-0.6 

>  ARMAspec (model  =  list (ma= -c ( thetal , theta2 )  )  ) 
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AR(1)  Spectral  Density 

To  find  the  spectral  density  for  AR  models,  we  use  Equation  (13.4.8)  “backwards.”  That 
is,  we  view  the  white  noise  process  as  being  a  linear  filtering  of  the  AR  process.  Recall¬ 
ing  the  spectral  density  of  the  MA(1)  series,  this  gives 

[  1  +  (|)2  -  2§cos(2nf)]S(f)  =  a2  (13.5.5) 

which  we  solve  to  obtain 

a2 

S(f)  =  - - - e- -  (13.5.6) 

1  +  4>-  —  2(|)cos(2ji/) 

As  the  next  two  exhibits  illustrate,  this  spectral  density  is  a  decreasing  function  of  fre¬ 
quency  when  <|)  >  0,  while  the  spectral  density  increases  for  4)  <  0. 


Exhibit  13.12  Spectral  Density  of  an  AR(1)  Process  with  cj)  =  0.9 


t  i  i  i  i  r 


0.0  0.1  0.2  0.3  0.4  0.5 

Frequency 


>  phi=0 . 9  #  Reset  value  of  phi  for  other  AR(1)  models 

>  ARMAspec (model=list (ar=phi) ) 
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Exhibit  13.13  Spectral  Density  of  an  AR(1)  Process  with  <|>  =  -0.6 


AR(2)  Spectral  Density 


For  the  AR(2)  spectral  density,  we  again  use  Equation  (13.4.8)  backwards  together  with 
the  MA(2)  result  to  obtain 


S(f) 


1  +  (jjp  +  c))^  —  2<j)1(l  —  4>9)cos(2ti/)  -  24>9cos(4ti/) 


(13.5.7) 


Just  as  with  the  correlation  properties,  the  spectral  density  for  an  AR(2)  model  can 
exhibit  a  variety  of  behaviors  depending  on  the  actual  values  of  the  two  4>  parameters. 

Exhibits  13.14  and  13.15  display  two  AR(2)  spectral  densities  that  show  very  dif¬ 
ferent  behavior  of  peak  in  one  case  and  trough  in  another. 


Exhibit  13.14  Spectral  Density  of  AR(2)  Process:  4>-,  =  1.5  and  §2  =  -0.75 


>  phi 1=1 . 5 ;  phi2=-.75 

>  #  Reset  values  of  phil  &  phi2  for  other  AR(2)  models 

>  ARMAspec (model=list (ar=c (phil , phi2 ) ) ) 
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Jenkins  and  Watts  (1968,  p.  229),  have  noted  that  the  different  spectral  shapes  for 
an  AR(2)  spectrum  are  determined  by  the  inequality 

|<|>1(1-<|>2)|<|4<|>2|  (13.5.8) 

and  the  results  are  best  summarized  in  the  display  in  Exhibit  13.16.  In  this  display,  the 
dashed  curve  is  the  border  between  the  regions  of  real  roots  and  complex  roots  of  the 
AR(2)  characteristic  equation.  The  solid  curves  are  determined  from  the  inequality 
given  in  Equation  (13.5.8). 


Exhibit  13.15  Spectral  Density  of  AR(2)  Process  with  ^  =  0.1  and  4>2  =  0.4 


Exhibit  13.16  AR(2)  Parameter  Values  for  Various  Spectral  Density  Shapes 
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Note  that  Jenkins  and  Watts  also  showed  that  the  frequency /0  at  which  the  peak  or 
trough  occurs  will  satisfy 

4> ,  ( l  —  4>t) 

cos(27t/0)  =  -  ■ 1  Y-  (13.5.9) 

4cp2 

It  is  commonly  thought  that  complex  roots  are  associated  with  a  peak  spectrum.  But 
notice  that  there  is  a  small  region  of  parameter  values  where  the  roots  are  complex  but 
the  spectrum  is  of  either  high  or  low  frequency  with  no  intermediate  peak. 


ARMA(1,1)  Spectral  Density 


Combining  what  we  know  for  MA(1)  and  AR(1)  models,  we  can  easily  obtain  the  spec¬ 
tral  density  for  the  ARM A(  1,1)  mixed  model 


=  1  +  92-29cos(27t,/)t_? 
1  +  ())2  -  2(j>cos(2jt/)  e 


(13.5.10) 


Exhibit  13.17  provides  an  example  of  the  spectrum  for  an  ARM A(  1,1)  model  with  <j)  = 
0.5  and  9  =  0.8. 


Exhibit  13.17  Spectral  Density  of  ARMA(1,1)  with  4>  =  0.5  and  0  =  0.8 


>  phi=0 . 5 ;  theta=0.8 

>  ARMAspec (model=list (ar=phi , ma=-theta) ) 


ARMA(p,qr) 


For  the  general  ARMA(p.q)  case,  the  spectral  density  may  be  expressed  in  terms  of  the 
AR  and  MA  characteristic  polynomials  as 


Q(e~2nif) 
<| )(e~2nif) 


S(f)  = 


(13.5.11) 
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This  may  be  further  expressed  in  terms  of  the  reciprocal  roots  of  these  polynomials,  but 
we  will  not  pursue  those  expressions  here.  This  type  of  spectral  density  is  often  referred 
to  as  a  rational  spectral  density. 

Seasonal  ARMA  Processes 

Since  seasonal  ARMA  processes  are  just  special  ARMA  processes,  all  of  our  previous 
work  will  carry  over  here.  Multiplicative  seasonal  models  can  be  thought  of  as  applying 
two  linear  filters  consecutively.  We  will  just  give  two  examples. 


Consider  the  process  defined  by  the  seasonal  AR  model 

( 1  -  §B)(  1  -  O  Bn)Yt  =  et  (13.5.12) 


Manipulating  the  two  factors  separately  yields 


S(f) 


[1  +  ())2-2(])cos(2ji/)][1  +  <t>2  -  2<f>cos(2it  12/)] 


(13.5.13) 


An  example  of  this  spectrum  is  shown  in  Exhibit  13.18,  where  4>  =  0.5,  O  =  0.9,  and  5  = 
12.  The  seasonality  is  reflected  in  the  many  spikes  of  decreasing  magnitude  at  frequen¬ 
cies  of  0,  1/12,  2/12,  3/12,  4/12,  5/12,  and  6/12. 

As  a  second  example,  consider  a  seasonal  MA  process 


Yt  =  (1  -  0B')(  1  -  0BI2)c( 


(13.5.14) 


The  corresponding  spectral  density  is  given  by 

S(f)  =  [1  +  02-20cos(2ji/)][1 +©2-2©cos(2ji12/)]ct2  (13.5.15) 

Exhibit  13.19  shows  this  spectral  density  for  parameter  values  0  =  0.4  and  ©  =  0.9. 


Exhibit  13.18  Spectral  Density  of  Seasonal  AR  with  4>  =  0.5,  <t>  =  0.9,  s  =12 


>  phi=.5;  PHI=.9 

>  ARMAspec (model=list (ar=phi , seasonal=list ( sar=PHI , period=12 ) ) ) 
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Exhibit  13.19  Spectral  Density  of  Seasonal  MA  with  0  =  0.4,  ©  =  0.9,  s  =12 


>  theta= . 4 ;  Theta= . 9 

>  ARMAspec (model=list (ma=-theta, seasonal=list (sma=-Theta, 

period=12 ) ) ) 


13.6  Sampling  Properties  of  the  Sample  Spectral  Density 


To  introduce  this  section,  we  consider  a  time  series  with  known  properties.  Suppose  that 
we  simulate  an  AR(1)  model  with  4>  =  -0.6  of  length  n  =  200.  Exhibit  13.13  on  page 
336,  shows  the  theoretical  spectral  density  for  such  a  series.  The  sample  spectral  density 
for  our  simulated  series  is  displayed  in  Exhibit  13.20,  with  the  smooth  theoretical  spec¬ 
tral  density  shown  as  a  dotted  line.  Even  with  a  sample  of  size  200,  the  sample  spectral 
density  is  extremely  variable  from  one  frequency  point  to  the  next.  This  is  surely  not  an 
acceptable  estimate  of  the  theoretical  spectrum  for  this  process.  We  must  investigate  the 
sampling  properties  of  the  sample  spectral  density  to  understand  the  behavior  that  we 
see  here. 

To  investigate  the  sampling  properties  of  the  sample  spectral  density,  we  begin  with 
the  simplest  case,  where  the  time  series  { Yt}  is  zero-mean  normal  white  noise  with  vari¬ 
ance  7().  Recall  that 

^  Ytcos(2ntf)  and  fiy  =  =  ^  Yt&m(2ntf)  (13.6.1) 

n t =  1  nt=  1 

A  A 

For  now,  consider  only  nonzero  Fourier  frequencies  /  =  jin  <  Vi.  Since  and 
are  linear  functions  of  the  time  series  { Yt},  they  each  have  a  normal  distribution.  We  can 
evaluate  the  means  and  variances  using  the  orthogonality  properties  of  the  cosines  and 

+  A  A 

sines. 1  We  find  that  Ay  and  each  have  mean  zero  and  variance  2yg In.  We  can  also  use 
the  orthogonality  properties  to  show  that  Ay  and  Bf  are  uncorrelated  and  thus  indepen- 


See  Appendix  J  on  page  349. 
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dent  since  they  are  jointly  bivariate  normal.  Similarly,  it  can  be  shown  that  for  any  two 

A  A  A  A 

distinct  Fourier  frequencies /j  and  f2,  A^,  B y ,  and  are  jointly  independent. 


Exhibit  13.20  Sample  Spectral  Density  for  a  Simulated  AR(1)  Process 


>  win . graph (width=4 . 875 , height =2 . 5 , point size =8 ) 

>  set . seed (271435) ;  n=200;  phi=-0.6 

>  y=arima . sim (model=list (ar=phi ) , n=n) 

>  sp=spec (y, log= ' no ' , xlab= ' Frequency' , 

ylab= ' Sample  Spectral  Density sub= 1 ' ) 

>  lines ( sp$f req, ARMAspec (model=list (ar=phi) , f req=sp$f req, 

plot  =  F) $spec, lty= 1  dotted 1 ) ;  abline (h=0) 


Furthermore,  we  know  that  the  square  of  a  standard  normal  has  a  chi-square  distri¬ 
bution  with  one  degree  of  freedom  and  that  the  sum  of  independent  chi-square  variables 
is  chi-square  distributed  with  degrees  of  freedom  added  together.  Since  S(f)  =  y0,  we 
have 


^-[(Af)2  +  (Bf)2]  =  25 ^ 

2y0  f  f  S(f) 

(13.6.2) 

has  a  chi-square  distribution  with  two  degrees  of  freedom. 

Recall  that  a  chi-square  variable  has  a  mean  equal  to  its  degrees  of  freedom  and  a 
variance  equal  to  twice  its  degrees  of  freedom.  With  these  facts,  we  quickly  discover 
that 

S{fi)  and  S(f2)  are  independent  for  /j  ^/2 

(13.6.3) 

P 

3 

CL 

fra 

S 

II 

s 

(13.6.4) 

Var[S(f)]  =  S2(f) 

(13.6.5) 
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Equation  (13.6.4)  expresses  the  desirable  fact  that  the  sample  spectral  density  is  an  unbi¬ 
ased  estimator  of  the  theoretical  spectral  density. 

Unfortunately,  Equation  (13.6.5)  shows  that  the  variance  in  no  way  depends  on  the 
sample  size  n.  Even  in  this  simple  case,  the  sample  spectral  density  is  not  a  consistent 
estimator  of  the  theoretical  spectral  density.  It  does  not  get  better  (that  is,  have  smaller 
variance)  as  the  sample  size  increases.  The  reason  the  sample  spectral  density  is  incon¬ 
sistent  is  basically  this:  Even  if  we  only  consider  Fourier  frequencies,  I  In.  2  In,.. . ,  we  are 
trying  to  estimate  more  and  more  “parameters”;  that  is,  S(l/n),  S(2/n ),...  .  As  the  sample 
size  increases,  there  are  not  enough  data  points  per  parameter  to  produce  consistent  esti¬ 
mates. 

The  results  expressed  in  Equations  ( 13.6.3)— (13.6.5)  in  fact  hold  more  generally.  In 
the  exercises,  we  ask  you  to  argue  that  for  any  white  noise — not  necessarily  normal — 
the  mean  result  holds  exactly  and  the  Ay  and  5y  that  make  up  S(f j)  and  S(f2 )  are  at 
least  uncorrelated  for/)  *f2. 

To  state  more  general  results,  suppose  { Yt }  is  any  linear  process 


Y,  =  et  +  V\et-l+V2et-2+  ■■■ 


(13.6.6) 


where  the  e’s  are  independent  and  identically  distributed  with  zero  mean  and  common 
variance.  Suppose  that  the  \|/-coefficients  are  absolutely  summable,  and  let  /j  *f2  be  any 
frequencies  in  0  to  Vi.  Then  it  may  be  shown  *  that  as  the  sample  size  increases  without 
limit 


2  kf) 


and 


2  S(f2) 
S(f2 ) 


(13.6.7) 


converge  in  distribution  to  independent  chi-square  random  variables,  each  with  two 
degrees  of  freedom. 

To  investigate  the  usefulness  of  approximations  based  on  Equations  (13.6.7), 
(13.6.4),  and  (13.6.5),  we  will  display  results  from  two  simulations.  We  first  simulated 
1000  replications  of  an  MA(1)  time  series  with  0  =  0.9,  each  of  length  n  =  48.  The  white 
noise  series  used  to  create  each  MA(1)  series  was  selected  independently  from  a  r-distri- 
bution  with  five  degrees  of  freedom  scaled  to  unit  variance.  From  the  1000  series,  we 
calculated  1000  sample  spectral  densities. 

Exhibit  13.21  shows  the  average  of  the  1000  sample  spectral  densities  evaluated  at 
the  24  Fourier  frequencies  associated  with  n  =  48.  The  solid  line  is  the  theoretical  spec¬ 
tral  density.  It  appears  that  the  sample  spectral  densities  are  unbiased  to  a  useful  approx¬ 
imation  in  this  case. 


See,  for  example,  Fuller  (1996,  pp.  360-361). 
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Exhibit  13.21  Average  Sample  Spectral  Density: 

Simulated  MA(1),  0  =  0.9,  n  =  48 


For  the  extensive  R  code  to  produce  Exhibits  13.21  through  13.26, 
please  see  the  Chapter  13  script  file  associated  with  this  book. 


Exhibit  13.22  plots  the  standard  deviations  of  the  sample  spectral  densities  over  the 
1000  replications.  According  to  Equation  (13.6.5),  we  hope  that  they  match  the  theoret¬ 
ical  spectral  density  at  the  Fourier  frequencies.  Again  the  approximation  seems  to  be 
quite  acceptable. 


Exhibit  13.22  Standard  Deviation  of  Sample  Spectral  Density: 
Simulated  MA(1),  0  =  0.9,  n  =  48 

0 


Frequency 

To  check  on  the  shape  of  the  sample  spectral  density  distribution,  we  constructed  a 
QQ  plot  comparing  the  observed  quantiles  with  those  of  a  chi-square  distribution  with 


344 


Introduction  to  Spectral  Analysis 


two  degrees  of  freedom.  Of  course,  we  could  do  those  for  any  of  the  Fourier  frequen¬ 
cies.  Exhibit  13.23  shows  the  results  at  the  frequency  15/48.  The  agreement  with  the 
chi-square  distribution  appears  to  be  acceptable. 


Exhibit  13.23  QQ  Plot  of  Spectral  Distribution  at  f  =  15/48 


Chi-Square  Quantiles 

We  repeated  similar  displays  and  calculations  when  the  true  model  was  an  AR(2) 
with  4>j  =  1.5,  (j)9  =  -0.75,  and  n  =  96.  Here  we  used  normal  white  noise.  The  results  are 
displayed  in  Exhibits  13.24,  13.25,  and  13.26.  Once  more  the  simulation  results  with  n  = 
96  and  1000  replications  seem  to  follow  those  suggested  by  limit  theory  quite  remark¬ 
ably. 


Exhibit  13.24  Average  Sample  Spectral  Density: 

Simulated  AR(2),  <|)1  =  1.5,  (j)2  =  -0.75,  n  =  96 


Frequency 
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Exhibit  13.25  Standard  Deviation  of  Sample  Spectral  Density: 
Simulated  AR(2),  =  1.5,  <|> 2  =  -0.75,  n  =  96 


Exhibit  13.26  QQ  Plot  of  Spectral  Distribution  at  f  =  40/96 


Chi-Square  Quantiles 

Of  course,  none  of  these  results  tell  us  that  the  sample  spectral  density  is  an  accept¬ 
able  estimator  of  the  underlying  theoretical  spectral  density.  The  sample  spectral  density 
is  quite  generally  approximately  unbiased  but  also  inconsistent,  with  way  too  much  vari¬ 
ability  to  be  a  useful  estimator  as  it  stands.  The  approximate  independence  at  the  Fourier 
frequencies  also  helps  explain  the  extreme  variability  in  the  behavior  of  the  sample 
spectral  density. 
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13.7  Summary 


The  chapter  introduces  the  ideas  of  modeling  time  series  as  linear  combinations  of  sines 
and  cosines — so-called  spectral  analysis.  The  periodogram  was  introduced  as  a  tool  for 
finding  the  contribution  of  the  various  frequencies  in  the  spectral  representation  of  the 
series.  The  ideas  were  then  extended  to  modeling  with  a  continuous  range  of  frequen¬ 
cies.  Spectral  densities  of  the  ARMA  models  were  explored.  Finally,  the  sampling  prop¬ 
erties  of  the  sample  spectral  density  were  presented.  Since  the  sample  spectral  density  is 
not  a  consistent  estimator  of  the  theoretical  spectral  density,  we  must  search  further  for 
an  acceptable  estimator.  That  is  the  subject  of  the  next  chapter. 


Exercises 


13.1  Find  A  and  B  so  that  3  cos (2 nft  +  0.4)  =  Acos(2nft)  +  Bsin(2jt/t) . 

13.2  Find  R  and  O  so  that  Rcos(2jt/f  +  <£>)  =  cos(2it/f)  +  3  sin(27t/f) . 

13.3  Consider  the  series  displayed  in  Exhibit  13.2  on  page  320. 

(a)  Verify  that  regressing  the  series  on  cos(2ji ft)  and  sin(2jt/r)  for  /  =  4/96  pro¬ 
vides  perfect  estimates  of  A  and  B. 

(b)  Use  Equations  (13.1.5)  on  page  321  to  obtain  the  relationship  between  R,  d>.  A 
and  B  for  the  cosine  component  at  frequency/=  14/96.  (For  this  component, 
the  amplitude  is  1  and  the  phase  is  0.6it.) 

(c)  Verify  that  regressing  the  series  on  cos(2jt/f)  and  sin(2jt/r)  for/=  14/96  pro¬ 
vides  perfect  estimates  of  A  and  B. 

(d)  Verify  that  regressing  the  series  on  cos(2jt/i)  and  sin(2it/r)  for  both  /  =  4/96 
and/=  14/96  together  provides  perfect  estimates  of  A4,  Z?4,  A14,  and  fi14. 

(e)  Verify  that  regressing  the  series  on  cos(2jt ft)  and  sin(27tff)  for /=  3/96  and  /  = 
13/96  together  provides  perfect  estimates  of  A3,  fi3,  A13,  and  Z?13. 

(f)  Repeat  part  (d)  but  add  a  third  pair  of  cosine-sine  predictor  variables  at  any 
other  Fourier  frequency.  Verify  that  all  of  the  regression  coefficients  are  still 
estimated  perfectly. 

13.4  Generate  or  choose  any  series  of  length  n  =  10.  Show  that  the  series  may  be  fit 
exactly  by  a  linear  combination  of  enough  cosine-sine  curves  at  the  Fourier  fre¬ 
quencies. 

13.5  Simulate  a  signal  +  noise  time  series  from  the  model  in  Equation  (13.2.4)  on  page 
323.  Use  the  same  parameter  values  used  in  Exhibit  13.4  on  page  324. 

(a)  Plot  the  time  series  and  look  for  the  periodicities.  Can  you  see  them? 

(b)  Plot  the  periodogram  for  the  simulated  series.  Are  the  periodicities  clear  now? 

13.6  Show  that  the  covariance  function  for  the  series  defined  by  Equation  (13.3.1)  on 
page  327  is  given  by  the  expression  in  Equation  (13.3.2). 

13.7  Display  the  algebra  that  establishes  Equation  (13.3. 10)  on  page  329. 
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13.8  Show  that  if  {X,}  and  { Yt }  are  independent  stationary  series,  then  the  spectral 
density  of  {Xt  +  Yt }  is  the  sum  of  the  spectral  densities  of  [X,  ]  and  { Yt). 

13.9  Show  that  when  0  >  0  the  spectral  density  for  an  MA(1)  process  is  an  increasing 
function  of  frequency,  while  for  0  <  0  this  function  decreases. 

13.10  Graph  the  theoretical  spectral  density  for  an  MA(  1)  process  with  0  =  0.6.  Interpret 
the  implications  of  the  shape  of  the  spectrum  on  the  possible  plots  of  the  time 
series  values. 

13.11  Graph  the  theoretical  spectral  density  for  an  MA(  1)  process  with  0  =  -0.8.  Inter¬ 
pret  the  implications  of  the  shape  of  the  spectrum  on  the  possible  plots  of  the  time 
series  values. 

13.12  Show  that  when  4>  >  0  the  spectral  density  for  an  AR(1)  process  is  a  decreasing 
function  of  frequency,  while  for  4)  <  0  the  spectral  density  increases. 

13.13  Graph  the  theoretical  spectral  density  for  an  AR(1)  time  series  with  4>  =  0.7.  Inter¬ 
pret  the  implications  of  the  shape  of  the  spectrum  on  the  possible  plots  of  the  time 
series  values. 

13.14  Graph  the  theoretical  spectral  density  for  an  AR(1)  time  series  with  4>  =  -0.4. 
Interpret  the  implications  of  the  shape  of  the  spectrum  on  the  possible  plots  of  the 
time  series  values. 

13.15  Graph  the  theoretical  spectral  density  for  an  MA(2)  time  series  with  0]  =  -0.5  and 
0t  =  0.9.  Interpret  the  implications  of  the  shape  of  the  spectrum  on  the  possible 
time  series  plots  of  the  series  values. 

13.16  Graph  the  theoretical  spectral  density  for  an  MA(2)  time  series  with  0j  =  0.5  and 
02  =  -0.9.  Interpret  the  implications  of  the  shape  of  the  spectrum  on  the  possible 
time  series  plots  of  the  series  values. 

13.17  Graph  the  theoretical  spectral  density  for  an  AR(2)  time  series  with  (Jq  =  -0.1  and 
4>2  =  -0.9.  Interpret  the  implications  of  the  shape  of  the  spectrum  on  the  possible 
time  series  plots  of  the  series  values. 

13.18  Graph  the  theoretical  spectral  density  for  an  AR(2)  process  with  (|q  =  1.8  and  4»2  = 
-0.9.  Interpret  the  implications  of  the  shape  of  the  spectrum  on  the  possible  plots 
of  the  time  series  values. 

13.19  Graph  the  theoretical  spectral  density  for  an  AR(2)  process  with  (jq  =  -1  and  (|)2  = 
-0.8.  Interpret  the  implications  of  the  shape  of  the  spectrum  on  the  possible  plots 
of  the  time  series  values. 

13.20  Graph  the  theoretical  spectral  density  for  an  AR(2)  process  with  (jq  =  0.5  and  4»2  = 
0.4.  Interpret  the  implications  of  the  shape  of  the  spectrum  on  the  possible  plots  of 
the  time  series  values. 

13.21  Graph  the  theoretical  spectral  density  for  an  AR(2)  process  with  (jq  =  0  and  4>2  = 
0.8.  Interpret  the  implications  of  the  shape  of  the  spectrum  on  the  possible  plots  of 
the  time  series  values. 

13.22  Graph  the  theoretical  spectral  density  for  an  AR(2)  process  with  (jq  =0.8  and  4»2  = 
-0.2.  Interpret  the  implications  of  the  shape  of  the  spectrum  on  the  possible  plots 
of  the  time  series  values. 

13.23  Graph  the  theoretical  spectral  density  for  an  ARMA(1,1)  time  series  with  (|)  =  0.5 
and  0  =  0.8.  Interpret  the  implications  of  the  shape  of  the  spectrum  on  the  possible 
plots  of  the  time  series  values. 
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13.24  Graph  the  theoretical  spectral  density  for  an  ARMA(l.l)  process  with  4)  =  0.95 
and  9  =  0.8.  Interpret  the  implications  of  the  shape  of  the  spectrum  on  the  possible 
plots  of  the  time  series  values. 

13.25  Let  {Xt }  be  a  stationary  time  series  and  {  Y, }  be  defined  by  Yt=  (Xt  +  X.  _  j)/2  . 

(a)  Find  the  power  transfer  function  for  this  linear  filter. 

(b)  Is  this  a  causal  filter? 

(c)  Graph  the  power  transfer  function  and  describe  the  effect  of  using  this  filter. 
That  is,  what  frequencies  will  be  retained  (emphasized)  and  what  frequencies 
will  be  deemphasized  (attenuated)  by  this  filtering? 

13.26  Let  {Xf}  be  a  stationary  time  series  and  let  [  Y, }  be  defined  by  Y,  =  X.-X.  v 

(a)  Find  the  power  transfer  function  for  this  linear  filter. 

(b)  Is  this  a  causal  filter? 

(c)  Graph  the  power  transfer  function  and  describe  the  effect  of  using  this  filter. 
That  is,  what  frequencies  will  be  retained  (emphasized)  and  what  frequencies 
will  be  deemphasized  (attenuated)  by  this  filtering? 

13.27  Let  {Xt}  be  a  stationary  time  series  and  let  Y,  =  {Xf+  |  +  Xf  +  Xf  _1)/3  define 
{>';}• 

(a)  Find  the  power  transfer  function  for  this  linear  filter. 

(b)  Is  this  a  causal  filter? 

(c)  Graph  the  power  transfer  function  and  describe  the  effect  of  using  this  filter. 
That  is,  what  frequencies  will  be  retained  (emphasized)  and  what  frequencies 
will  be  deemphasized  (attenuated)  by  this  filtering? 

13.28  Let  \Xf  )  be  a  stationary  time  series  and  let  Yt  =  (Xt  +  Xti  +Xf  _2)/3  define 
{>";}• 

(a)  Show  that  the  power  transfer  function  of  this  filter  is  the  same  as  the  power 
transfer  function  of  the  filter  defined  in  Exercise  13.27. 

(b)  Is  this  a  causal  filter? 

13.29  Let  ( Xt )  he  a  stationary  time  series  and  let  Yt=  Xf-Xf_4  define  \Yt\. 

(a)  Find  the  power  transfer  function  for  this  linear  filter. 

(b)  Graph  the  power  transfer  function  and  describe  the  effect  of  using  this  filter. 
That  is,  what  frequencies  will  be  retained  (emphasized)  and  what  frequencies 
will  be  deemphasized  (attenuated)  by  this  filtering? 

13.30  Let  {Xt}  be  a  stationary  time  series  and  let  { Yt }  be  defined  by  Y,  = 
(Xt+l-2Xt  +  Xt^)/3. 

(a)  Find  the  power  transfer  function  for  this  linear  filter. 

(b)  Graph  the  power  transfer  function  and  describe  the  effect  of  using  this  filter. 
That  is,  what  frequencies  will  be  retained  (emphasized)  and  what  frequencies 
will  be  deemphasized  (attenuated)  by  this  filtering? 
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13.31  Suppose  that  { Yt}  is  a  white  noise  process  not  necessarily  normal.  Use  the  orthog¬ 
onality  properties  given  in  Appendix  J  to  establish  the  following  at  the  Fourier 
frequencies. 

(a)  The  sample  spectral  density  is  an  unbiased  estimator  of  the  theoretical  spectral 
density. 

(b)  The  variables  4^  and  are  uncorrelated  for  any  Fourier  frequencies /j,/2. 

(c)  If  the  Fourier  frequencies/j  ^/2,  the  variables  Af  and  Af0  are  uncorrelated. 

13.32  Carry  out  a  simulation  analysis  similar  to  those  reported  in  Exhibits  13.21,  13.22, 
13.23,  and  13.24.  Use  an  AR(2)  model  with  (jq  =  0.5,  <|>2  =  -0.8,  and  n  =  48.  Rep¬ 
licate  the  series  1000  times. 

(a)  Display  the  average  sample  spectral  density  by  frequency  and  compare  it  with 
large  sample  theory. 

(b)  Display  the  standard  deviation  of  the  sample  spectral  density  by  frequency 
and  compare  it  with  large  sample  theory. 

(c)  Display  the  QQ  plot  of  the  appropriately  scaled  sample  spectral  density  com¬ 
pared  with  large  sample  theory  at  several  frequencies.  Discuss  your  results. 

13.33  Carry  out  a  simulation  analysis  similar  to  those  reported  in  Exhibits  13.21,  13.22, 
13.23,  and  13.24.  Use  an  AR(2)  model  with  4> j  =  -1,  <|>2  =  -0.75,  and  n  =  96.  Rep¬ 
licate  the  time  series  1000  times. 

(a)  Display  the  average  sample  spectral  density  by  frequency  and  compare  it  with 
the  results  predicted  by  large  sample  theory. 

(b)  Display  the  standard  deviation  of  the  sample  spectral  density  by  frequency 
and  compare  it  with  the  results  predicted  by  large  sample  theory. 

(c)  Display  the  QQ  plot  of  the  appropriately  scaled  sample  spectral  density  and 
compare  with  the  results  predicted  by  large  sample  theory  at  several  frequen¬ 
cies.  Discuss  your  results. 

13.34  Simulate  a  zero-mean,  unit-variance,  normal  white  noise  time  series  of  length  n  = 
1000.  Display  the  periodogram  of  the  series,  and  comment  on  the  results. 
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For  j,  k  =  0,  1,2,...,  n/2,  we  have 
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Z 

t=  1 
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Z  sin 

t  =  t 
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=  o 
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Z 
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(13.J.1) 

(13.J.2) 
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(13. J. 3) 
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^  cos(  2n-t)  cosf  2n-t 
t=  l 


^  sin  I  2iJ-t\  sinf27i-rj  =  -j 


t  =  1 


n 

2 

if  j  =  0 

or  n/2) 

n 

o 

II 

II 

0 

if  j  *  k 

n 

2 

if  j  =  k  (j  *  0 

or  n/2) 

0 

if  j*k 

These  are  most  easily  proved  using  DeMoivre’s  theorem 
e-2nif  _  cos(27t/)  -  isin(2nf) 
or,  equivalently,  Euler’s  formulas, 

„2t lif  .  „-2nif  a2nif  „-2nif 

cos(2it f)  =  - - — -  and  sin(2jt f)  =  - - - - 

2  2  i 

together  with  the  result  for  the  sum  of  a  finite  geometric  series,  namely 


7=1 


rj  =  r(l-rn) 
l-r 


(13.J.4) 

(13.J.5) 

(13.J.6) 

(13.J.7) 

(13.J.8) 


for  real  or  complex  r  1 . 


Chapter  14 

Estimating  the  Spectrum 


Several  alternative  methods  for  constructing  reasonable  estimators  of  the  spectral  den¬ 
sity  have  been  proposed  and  investigated  over  the  years.  We  will  highlight  just  a  few  of 
them  that  have  gained  the  most  acceptance  in  light  of  present-day  computing  power. 
So-called  nonparametric  estimation  of  the  spectral  density  (that  is,  smoothing  of  the 
sample  spectral  density  )  assumes  very  little  about  the  shape  of  the  “true”  spectral  den¬ 
sity.  Parametric  estimation  assumes  that  an  autoregressive  model — perhaps  of  high 
order — provides  an  adequate  fit  to  the  time  series.  The  estimated  spectral  density  is  then 
based  on  the  theoretical  spectral  density  of  the  fitted  AR  model.  Some  other  methods  are 
touched  on  briefly. 

14.1  Smoothing  the  Spectral  Density 


The  basic  idea  here  is  that  most  spectral  densities  will  change  very  little  over  small 
intervals  of  frequencies.  As  such,  we  should  be  able  to  average  the  values  of  the  sample 
spectral  density  over  small  intervals  of  frequencies  to  gain  reduced  variability.  In  doing 
so,  we  must  keep  in  mind  that  we  may  introduce  bias  into  the  estimates  if,  in  fact,  the 
theoretical  spectral  density  does  change  substantially  over  that  interval.  There  will 
always  be  a  trade-off  between  reducing  variability  and  introducing  bias.  We  will  be 
required  to  use  judgment  to  decide  how  much  averaging  to  perform  in  a  particular  case. 

Let /be  a  Fourier  frequency.  Consider  taking  a  simple  average  of  the  neighboring 
sample  spectral  density  values  centered  on  frequency  /  and  extending  m  Fourier  fre¬ 
quencies  on  either  side  off.  We  are  averaging  2m  +  1  values  of  the  sample  spectrum,  and 
the  smoothed  sample  spectral  density  is  given  by 

^  =  2^71  I  <14-u> 

j  =  -m 

(When  averaging  for  frequencies  near  the  end  points  of  0  and  Vi,  we  treat  the  peri- 
odogram  as  symmetric  about  0  and  Vi.) 

More  generally,  we  may  smooth  the  sample  spectrum  with  a  weight  function  or 
spectral  window  Wm(f)  with  the  properties 
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WJk)>0 
WJk)  =  WJ-k ) 

> 

m 

Z  wm(k)  =  1 

k  =  -m 

and  obtain  a  smoothed  estimator  of  the  spectral  density  as 

"l  a  (  A 

sw=  z  w>rSmf+-J 

k  =  -m 


(14.1.2) 


(14.1.3) 


The  simple  averaging  shown  in  Equation  (14.1.1)  corresponds  to  the  rectangular  spec¬ 
tral  window 

W  Jk)  =  — - —  for  -m  <  k  <  m  (14.1.4) 

2m  +1 


For  historical  reasons,  this  spectral  window  is  usually  called  the  Daniell  spectral  win¬ 
dow  after  P.  J.  Daniell,  who  first  used  it  in  the  1940s. 

As  an  example,  consider  the  simulated  AR(1)  series  whose  sample  spectral  density 
was  shown  in  Exhibit  13.20  on  page  341.  Exhibit  14.1  displays  the  smoothed  sample 
spectrum  using  the  Daniell  window  with  m  =  5.  The  true  spectrum  is  again  shown  as  a 
dotted  line.  The  smoothing  did  reduce  some  of  the  variability  that  we  saw  in  the  sample 
spectrum. 


Exhibit  14.1  Smoothed  Spectrum  Using  the  Daniell  Window  With  m  =  5 


>  win . graph (width=4 . 875 , height =2 . 5 , pointsize=8 ) 

>  set . seed (271435) ;  n=200;  phi=-0.6 

>  y=arima . sim (model=list (ar=phi ) ,n=n) 

>  k=kernel ( 1 daniell 1 , m=5 ) 
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>  sp=spec (y , kernel=k, log= ' no ' , sub= ' 1 , xlab= ' Frequency ' , 

ylab= ' Smoothed  Sample  Spectral  Density') 

>  lines ( sp$f req, ARMAspec (model=list (ar=phi) , f req=sp$f req, 

plot  =  F) $spec, lty= 1  dotted 1 ) 


If  we  make  the  smoothing  window  wider  (that  is,  increase  m)  we  will  reduce  the 
variability  even  further.  Exhibit  14.2  shows  the  smoothed  spectrum  with  a  choice  of  m  = 
15.  The  danger  with  more  and  more  smoothing  is  that  we  may  lose  important  details  in 
the  spectrum  and  introduce  bias.  The  amount  of  smoothing  needed  will  always  be  a  mat¬ 
ter  of  judgmental  trial  and  error,  recognizing  the  trade-off  between  reducing  variability 
at  the  expense  of  introducing  bias. 


Exhibit  14.2  Smoothed  Spectrum  Using  the  Daniell  Window  With  m  =  15 


>  k=kernel (' daniell m=15 ) 

>  sp=spec (y , kernel=k, log= 'no' , sub= ' ' , xlab= ' Frequency ' , 

ylab= ' Smoothed  Sample  Spectral  Density') 

>  lines ( sp$f req, ARMAspec (model=list (ar=phi) , f req=sp$f req, 

plot  =  F) $spec, lty= ' dotted 1 ) 


Other  Spectral  Windows 

Many  other  spectral  windows  have  been  suggested  over  the  years.  In  particular,  the 
abrupt  change  at  the  end  points  of  the  Daniell  window  could  be  softened  by  making  the 
weights  decrease  at  the  extremes.  The  so-called  modified  Daniell  spectral  window  sim¬ 
ply  defines  the  two  extreme  weights  as  half  of  the  other  weights  still  retaining  the  prop¬ 
erty  that  the  weights  sum  to  1.  The  leftmost  graph  in  Exhibit  14.3  shows  the  modified 
Daniell  spectral  window  for  m  =  3. 
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Exhibit  14.3  The  Modified  Daniell  Spectral  Window  and  Its  Convolutions 


k  k  k 

Another  common  way  to  modify  spectral  windows  is  to  use  them  to  smooth  the 
periodogram  more  than  once.  Mathematically,  this  amounts  to  using  the  convolution  of 
the  spectral  windows.  If  the  modified  Daniell  spectral  window  with  m  =  3  is  used  twice 
(convolved  with  itself),  we  in  fact  are  using  the  (almost)  triangular-shaped  window 
shown  in  the  middle  display  of  Exhibit  14.3.  A  third  smoothing  (with  m  =  3)  is  equiva¬ 
lent  to  using  the  spectral  window  shown  in  the  rightmost  panel.  This  spectral  window 
appears  much  like  a  normal  curve.  We  could  also  use  different  values  of  m  in  the  various 
components  of  the  convolutions. 

Most  researchers  agree  that  the  shape  of  the  spectral  window  is  not  nearly  as  impor¬ 
tant  as  the  choice  of  m  (or  the  bandwidth — see  below).  We  will  use  the  modified  Daniell 
spectral  window — possibly  with  one  or  two  convolutions — in  our  examples. ' 

14.2  Bias  and  Variance 


If  the  theoretical  spectral  density  does  not  change  much  over  the  range  of  frequencies 
that  the  smoothing  window  covers,  we  expect  the  smoothed  estimator  to  be  approxi¬ 
mately  unbiased.  A  calculation  using  this  approximation,  the  spectral  window  properties 
in  Equations  (14.1.2),  and  a  short  Taylor  expansion  produces 


or 


E[S(f)] 


X  Wm(k) 


k  =  -m 


S(f )  +  -S' (J)  +  \ 

n  2 


2S"(f) 


E[S(f)]«s(f)  +  \s-^  y  k2wm(k ) 

n  2  k  =  -m 


(14.2.1) 


'  In  R.  the  modified  Daniell  kernel  is  the  default  kernel  for  smoothing  sample  spectra,  and  m 
may  be  specified  by  simply  specifying  span  =  2m  +  1  in  the  spec  function  where  span  is  an 
abbreviation  of  the  spans  argument. 
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So  an  approximate  value  for  the  bias  in  the  smoothed  spectral  density  is  given  by 

bias  ~  ~2 £  k2Wm(k)  (14.2.2) 


k  =  -m 

For  the  Daniell  rectangular  spectral  window,  we  have 
_  _  rwm(K)  =  —£~ 

k  =  -m 


i  m 

~2  Z  ^WJk)  = 

n*-  _ 


22(2  m  +1)^3 


(m?  m-  m 
~  2  6 


(14.2.3) 


and  thus  the  bias  tends  to  zero  as  n  — >  oo  as  long  as  min  — >  0. 

Using  the  fact  that  the  sample  spectral  density  values  at  the  Fourier  frequencies  are 
approximately  uncorrelated  and  Equation  (13.6.5)  on  page  341,  we  may  also  obtain  a 
useful  approximation  for  the  variance  of  the  smoothed  spectral  density  as 


Var[S(f)] 


Wl(k)Var\s{f 


m 

*  Z 


so  that 

__  m 

Var[S(f)]*S2(f)  £  Wfck)  (14.2.4) 

k  =  —m 

m  | 

Note  that  for  the  Daniell  or  rectangular  spectral  window  Z  — ,  so 

that  as  long  as  m  — >  oo  (as  n  — »  oo)  we  have  consistency.  k  =  -m 

In  general,  we  require  that  as  n  — »  oo  we  have  min  — >  0  to  reduce  bias  and  m  — >  oo  to 
reduce  variance.  As  a  practical  matter,  the  sample  size  n  is  usually  fixed  and  we  must 
choose  m  to  balance  bias  and  variance  considerations. 

Jenkins  and  Watts  (1968)  suggest  trying  three  different  values  of  m.  A  small  value 
will  give  an  idea  where  the  large  peaks  in  S(f)  are  but  may  show  a  large  number  of 
peaks,  many  of  which  are  spurious.  A  large  value  of  m  may  produce  a  curve  that  is 
likely  to  be  too  smooth.  A  compromise  may  then  be  achieved  with  the  third  value  of  m. 
Chatfield  (2004,  p.  135)  suggests  using  m  =  Jn  .  Often  trying  values  for  m  ofljii, 
a fn  ,  and  Vi  Jn  will  give  you  some  insight  into  the  shape  of  the  true  spectrum.  Since  the 
width  of  the  window  decreases  as  m  decreases,  this  is  sometimes  called  window  closing. 
As  Hannan  (1973,  p.  3 1 1)  says,  “Experience  is  the  real  teacher  and  cannot  be  got  from  a 
book.” 


14.3  Bandwidth 


In  the  approximate  bias  given  by  Equation  (14.2.2),  notice  that  the  factor  S"(f)  depends 
on  the  curvature  of  the  true  spectral  density  and  will  be  large  in  magnitude  if  there  is  a 
sharp  peak  in  S(f)  near/but  will  be  small  when  S(f)  is  relatively  flat  near/.  This  makes 
intuitive  sense,  as  the  motivation  for  the  smoothing  of  the  sample  spectral  density 
assumed  that  the  true  density  changed  very  little  over  the  range  of  frequencies  used  in 
the  spectral  window.  The  square  root  of  the  other  factor  in  the  approximate  bias  from 
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Equation  (14.2.2)  is  sometimes  called  the  bandwidth ,  BW,  of  the  spectral  window, 
namely 

I  I  m 

BW  =  -  X  k2Wm(k)  (14.3.1) 

n/l]k  =  -m 

As  we  noted  in  Equation  (14.2.3),  for  the  Daniell  window  this  BW  will  tend  to  zero  as  n 
— >  oo  as  long  as  m/n  — >  0.  From  Equations  (14.1.2)  on  page  352  a  spectral  window  has 
the  mathematical  properties  of  a  discrete  zero-mean  probability  density  function,  so  the 
BW  defined  here  may  be  viewed  as  proportional  to  the  standard  deviation  of  the  spectral 
window.  As  such,  it  is  one  way  to  measure  the  width  of  the  spectral  window.  It  is  inter¬ 
preted  as  a  measure  of  width  of  the  band  of  frequencies  used  in  smoothing  the  sample 
spectral  density.  If  the  true  spectrum  contains  two  peaks  that  are  close  relative  to  the 
bandwidth  of  the  spectral  window,  those  peaks  will  be  smoothed  together  when  we  cal¬ 
culate  S  (f)  and  they  will  not  be  seen  as  separate  peaks.  It  should  be  noted  that  there  are 
many  alternative  definitions  of  bandwidth  given  in  the  time  series  literature.  Priestley 
(1981,  pp.  513-528)  spends  considerable  time  discussing  the  advantages  and  disadvan¬ 
tages  of  the  various  definitions. 

14.4  Confidence  Intervals  for  the  Spectrum 


The  approximate  distributional  properties  of  the  smoothed  spectral  density  may  be  eas¬ 
ily  used  to  obtain  confidence  intervals  for  the  spectrum.  The  smoothed  sample  spectral 
density  is  a  linear  combination  of  quantities  that  have  approximate  chi-square  distribu¬ 
tions.  A  common  approximation  in  such  a  case  is  to  use  some  multiple  of  another 
chi-square  distribution  with  degrees  of  freedom  obtained  by  matching  means  and  vari¬ 
ances.  Assuming  S  (f)  to  be  roughly  unbiased  with  variance  given  by  Equation  (14.2.4), 
matching  means  and  variances  leads  to  approximating  the  distribution  of 


vS(./) 

S(f) 


(14.4.1) 
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Xv,  1  -  a/2  Xv>  a/2 


(14.4.3) 


In  this  formulation,  the  width  of  the  confidence  interval  will  vary  with  frequency.  A 
review  of  Equation  (14.2.4)  on  page  355  shows  that  the  variance  of  S  (f)  is  roughly  pro¬ 
portional  to  the  square  of  its  mean.  As  we  saw  earlier  in  Equations  (5.4.1)  and  (5.4.2)  on 
page  98,  this  suggests  that  we  take  the  logarithm  of  the  smoothed  sample  spectral  den¬ 
sity  to  stabilize  the  variance  and  obtain  confidence  intervals  with  width  independent  of 
frequency  as  follows: 


log  [£(/)]  +  log 


-Xv  1  - 


a/2  J 


<log[S(/)]<log[S(/)]  +  log 


-Xv,  a/2J 


(14.4.4) 


For  these  reasons  it  is  common  practice  to  plot  the  logarithms  of  estimated  spectra.  If  we 
redo  Exhibit  14.2  on  page  353  in  logarithm  terms,  we  obtain  the  display  shown  in 
Exhibit  14.4,  where  we  have  also  drawn  in  the  95%  confidence  limits  (dotted)  and  the 
true  spectral  density  (dashed)  from  the  AR(1)  model.  With  a  few  exceptions,  the  confi¬ 
dence  limits  capture  the  true  spectral  density. 


Exhibit  14.4  Confidence  Limits  from  the  Smoothed  Spectral  Density 


>  set . seed (271435) ;  n=200;  phi=-0.6 

>  y=arima . sim (model=list (ar=phi ) , n=n) 

>  k=kernel ( ' daniell ' , m=15 ) 

>  sp=spec (y, kernel=k, sub= '  xlab= 1  Frequency 1 , 

ylab= 1  Log (Smoothed  Spectral  Density)',  ci .plot=T, ci . col=NULL) 

>  lines ( sp$f req , ARMAspec (model=list (ar=phi) , sp$f req, plot=F) $spec, 

lty= ' dashed ' ) 


Exhibit  14.5  shows  a  less  cluttered  display  of  confidence  limits.  Here  a  95%  confi¬ 
dence  interval  and  bandwidth  guide  is  displayed  in  the  upper  right-hand  corner — the 
“crosshairs.”  The  vertical  length  gives  the  length  (width)  of  a  confidence  interval,  while 
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the  horizontal  line  segment  indicates  the  central  point1  of  the  confidence  interval,  and  its 
width  (length)  matches  the  bandwidth  of  the  spectral  window.  If  you  visualize  the  guide 
repositioned  with  the  crosshairs  centered  on  the  smoothed  spectrum  above  any  fre¬ 
quency,  you  have  a  visual  display  of  a  vertical  confidence  interval  for  the  "true”  spectral 
density  at  that  frequency  and  a  rough  guide  of  the  extent  of  the  smoothing.  In  this  simu¬ 
lated  example,  we  also  show  the  true  spectrum  as  a  dotted  line. 


Exhibit  14.5  Logarithm  of  Smoothed  Spectrum  from  Exhibit  14.2 


>  sp=spec (y, span=31, sub= 1 ' ,xlab= ' Frequency1 , 

ylab= ' Log (Smoothed  Sample  Spectrum) 1 ) 

>  lines (sp$freq, ARMAspec (model=list (ar=phi) ,sp$freq, 

plot=F) $spec , lty= ' dotted ' ) 


14.5  Leakage  and  Tapering 


Much  of  the  previous  discussion  has  assumed  that  the  frequencies  of  interest  are  the 
Fourier  frequencies.  What  happens  if  that  is  not  the  case?  Exhibit  14.6  displays  the  peri- 
odogram  of  a  series  of  length  n  =  96  with  two  pure  cosine-sine  components  at  frequen¬ 
cies /=  0.088  and/=  14/96.  The  model  is  simply 


3cos[2n(0.088)f]  +  sin 


(14.5.1) 


Note  that  with  n  =  96,/=  0.088  is  not  a  Fourier  frequency.  The  peak  with  lower  power  at 
the  Fourier  frequency/=  14/96  is  clearly  indicated.  However,  the  peak  at /=  0.088  is  not 


1  The  central  point  is  not,  in  general,  halfway  between  the  endpoints,  as  Equation  (14.4.4) 
determines  asymmetric  confidence  intervals.  In  this  example,  using  the  modified  Daniell 
window  with  m  =  15,  we  have  v  =  61  degrees  of  freedom,  so  the  chi-square  distribution 
used  is  effectively  a  normal  distribution,  and  the  confidence  intervals  are  nearly  symmetric. 
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there.  Rather,  the  power  at  this  frequency  is  blurred  across  several  nearby  frequencies, 
giving  the  appearance  of  a  much  wider  peak. 


Exhibit  14.6  Periodogram  of  Series  with  Peaks  at  f=  0.088  and  f=  14/96 


Frequency 


>  win . graph (width=4 . 875 , height =2 . 5 , point size =8 ) 

>  t  =  1 : 9 6 ;  fl  =  0.088;  f2  =  14/96 

>  y=3*cos (f l*2*pi*t) +sin (f2*2*pi*t) 

>  periodogram (y) ;  abline(h=0) 


An  algebraic  analysis^  shows  that  we  may  view  the  periodogram  as  a  “smoothed” 
spectral  density  formed  with  the  Dirichlet  kernel  spectral  window  given  by 


=  1  sin(«7t/) 
n  sin(jt/) 


(14.5.2) 


Note  that  for  all  Fourier  frequencies /= jin,  D(f)  =  0,  so  this  window  has  no  effect  what¬ 
soever  at  those  frequencies.  However,  the  plot  of  D(f)  given  on  the  left-hand  side  of 
Exhibit  14.7  shows  significant  “side  lobes”  on  either  side  of  the  main  peak.  This  will 
cause  power  at  non-Fourier  frequencies  to  leak  into  the  supposed  power  at  the  nearby 
Fourier  frequencies,  as  we  see  in  Exhibit  14.6. 

Tapering  is  one  method  used  to  improve  the  issue  with  the  side  lobes.  Tapering 
involves  decreasing  the  data  magnitudes  at  both  ends  of  the  series  so  that  the  values 
move  gradually  toward  the  data  mean  of  zero.  The  basic  idea  is  to  reduce  the  end  effects 
of  computing  a  Fourier  transform  on  a  series  of  finite  length.  If  we  calculate  the  peri¬ 
odogram  after  tapering  the  series,  the  effect  is  to  use  the  modified  Dirichlet  kernel 
shown  on  the  right-hand  side  of  Exhibit  14.7  for  n  =  100.  Now  the  side  lobes  have 
essentially  disappeared. 


'  Appendix  K  on  page  381  gives  some  of  the  details. 
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Exhibit  14.7  Dirichlet  Kernel  and  Dirichlet  Kernel  after  Tapering 


nal  series  Yt  by  Y t ,  with 

Yt  =  htYt  (14.5.3) 

where,  for  example,  ht  is  the  cosine  bell  given  by 


ht  =  -i  1  -  cos 


'2n(t—  0.5)' 


(14.5.4) 


A  graph  of  the  cosine  bell  with  n  =  100  is  given  on  the  left-hand  side  of  Exhibit  14.8.  A 
much  more  common  taper  is  given  by  a  split  cosine  bell  that  applies  the  cosine  taper 
only  to  the  extremes  of  the  time  series.  The  split  cosine  bell  taper  is  given  by 


ht  = 


-<!  1  -  cos 


7t(?  —  1/2) 

m 


'n (n  -  t  +  1/2)' 
m 


for  1  <  t  <  m 


for  m+\<t<n- m 


for  n  -  m  +  1  <t  <n 


(14.5.5) 


which  is  called  a  100/?%  cosine  bell  taper  with  p  =  2 m/n.  A  10%  split  cosine  bell  taper  is 
shown  on  the  right-hand  side  of  Exhibit  14.8  again  with  n  =  100.  Notice  that  there  is  a 
10%  taper  on  each  end,  resulting  in  a  total  taper  of  20%.  In  practice,  split  cosine  bell 
tapers  of  10%  or  20%  are  in  common  use. 
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Exhibit  14.8  Cosine  Bell  and  10%  Taper  Split  Cosine  Bell  for  n  =  100 


We  return  to  the  variable  star  brightness  data  first  explored  on  page  325.  Exhibit 
14.9  displays  four  periodograms  of  this  series,  each  with  a  different  amount  of  tapering. 
Judging  by  the  length  of  the  95%  confidence  intervals  displayed  in  the  respective 
“crosshairs”,  we  see  that  the  two  peaks  found  earlier  in  the  raw  untapered  periodogram 
at  frequencies /j  =  21/600  and/ 2=  25/600  are  clearly  real.  A  more  detailed  analysis  of 
the  minor  peaks  shown  best  in  the  bottom  periodogram  are  all  in  fact  harmonics  of  the 
frequencies^  and/2.  There  is  much  more  on  the  topic  of  leakage  reduction  and  taper¬ 
ing  in  Bloomfield  (2000). 


362 


Estimating  the  Spectrum 


14.6  Autoregressive  Spectrum  Estimation 
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14.6  Autoregressive  Spectrum  Estimation 


In  the  preceding  sections  on  spectral  density  estimation,  we  did  not  make  any  assump¬ 
tions  about  the  parametric  form  of  the  true  spectral  density.  However,  an  alternative 
method  for  estimating  the  spectral  density  would  be  to  consider  fitting  an  AR,  MA,  or 
ARMA  model  to  a  time  series  and  then  use  the  spectral  density  of  that  model  with  esti¬ 
mated  parameters  as  our  estimated  spectral  density.  (Section  13.5,  page  332,  discussed 
the  spectral  densities  of  ARMA  models.)  Often  AR  models  are  used  with  possibly  large 
order  chosen  to  minimize  the  AIC  criterion. 

As  an  example,  consider  the  simulated  AR  series  with  4>  =  -0.6  and  n  =  200  that  we 
used  in  Exhibits  13.20,  14.1,  14.2,  and  14.5.  If  we  fit  an  AR  model,  choosing  the  order 
to  minimize  the  AIC,  and  then  plot  the  estimated  spectral  density  for  that  model,  we 
obtain  the  results  shown  in  Exhibit  14.10. 


Exhibit  14.10  Autoregressive  Estimation  of  the  Spectral  Density 


>  sp=spec (y , method= 1 ar ' , sub= '  1 , xlab= 1  Frequency ' , 

ylab= ' Log (AR  Spectral  Density  Estimate1) 

>  lines (sp$freq, ARMAspec (model=list (ar=phi) , f req=sp$f req, 

plot  =  F) $spec, lty= 1  dotted 1 ) 


Since  these  are  simulated  data,  we  also  show  the  true  spectral  density  as  a  dotted 
line.  In  this  case,  the  order  was  chosen  as  p  =  1  and  the  estimated  spectral  density  fol¬ 
lows  the  true  density  very  well.  We  will  show  some  examples  with  real  time  series  in 
Section  14.8. 
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14.7  Examples  with  Simulated  Data 


A  useful  way  to  get  a  feel  for  spectral  analysis  is  with  simulated  data.  Here  we  know 
what  the  answers  are  and  can  see  what  the  consequences  are  when  we  make  choices  of 
spectral  window  and  bandwidth.  We  begin  with  an  AR(2)  model  that  contains  a  fairly 
strong  peak  in  its  spectrum. 

AR(2)  with  <(>1  =  1.5, 4>2  =  -0.75:  A  Peak  Spectrum 

The  spectral  density  for  this  model  contained  a  peak  at  about /=  0.08,  as  displayed  in 
Exhibit  13.14  on  page  336.  We  simulated  a  time  series  from  this  AR(2)  model  with  nor¬ 
mal  white  noise  terms  with  unit  variance  and  sample  size  n  =  100.  Exhibit  14.11  shows 
three  estimated  spectral  densities  and  the  true  density  as  a  solid  line.  We  used  the  modi¬ 
fied  Daniell  spectral  window  with  three  different  values  for  span  =  2m  +  1  of  3,  9,  and 
15.  A  span  of  3  gives  the  least  amount  of  smoothing  and  is  shown  as  a  dotted  line.  A 
span  of  9  is  shown  as  a  dashed  line.  With  span  =  15,  we  obtain  the  most  smoothing,  and 
this  curve  is  displayed  with  a  dot-dash  pattern.  The  bandwidths  of  these  three  spectral 
windows  are  0.018,  0.052,  and  0.087,  respectively.  The  confidence  interval  and  band¬ 
width  guide  displayed  apply  only  to  the  dotted  curve  estimate.  The  two  others  have 
wider  bandwidths  and  shorter  confidence  intervals.  The  estimate  based  on  span  =  9  is 
probably  the  best  one,  but  it  does  not  represent  the  peak  very  well. 


Exhibit  14.11  Estimated  Spectral  Densities 


>  win . graph (width=4 . 875 , height =2 . 5 , pointsize=8 ) 

>  set . seed (271435) ;  n=100;  phil=1.5;  phi2=-.75 

>  y=arima . sim (model=list (ar=c (phil , phi2 ) ) , n=n) 

>  spl  =  spec (y, spans  =  3 , sub= '  1 , lty= 1  dotted ' , xlab= 1  Frequency' , 

ylab= ' Log (Estimated  Spectral  Density) ') 

>  sp2=spec (y, spans=9 , plot=F) ;  sp3=spec (y, spans=15 , plot=F) 

>  lines (sp2$freq, sp2$spec, lty= ' dashed' ) 

>  lines (sp3$freq, sp3$spec, lty= ' dotdash' ) 
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>  f=seq (0 . 001, . 5, by= . 001) 

>  lines ( f , ARMAspec (model=list (ar=c (phil , phi 2 ) ) , f req=f , 

plot  =  F) $spec, lty= 1  solid' ) 


We  also  used  the  parametric  spectral  estimation  idea  and  let  the  software  choose  the 
best  AR  model  based  on  the  smallest  AIC.  The  result  was  an  estimated  AR(2)  model 
with  the  spectrum  shown  in  Exhibit  14.12.  This  is  a  very  good  representation  of  the 
underlying  spectrum,  but  of  course  the  model  was  indeed  AR(2). 


Exhibit  14.12  AR  Spectral  Estimation:  Estimated  (dotted),  True  (solid) 


>  sp4=spec (y , method= ' ar 1 , lty= ' dotted ' , 

xlab= 1  Frequency ylab= 1  Log (Estimated  AR  Spectral  Density)  ') 

>  f=seq(0. 001, 0 .5,  by  0.001) 

>  lines ( f , ARMAspec (model=list (ar=c (phil , phi 2 ) ) , f req=f , 

plot  =  F) $spec, lty= 1  solid' ) 

>  sp4$method  #  This  will  tell  you  order  of  the  AR  model  selected 


AR(2)  with  (j>i  =  0.1 ,  4>2  =  0.4:  A  Trough  Spectrum 

Next  we  look  at  an  AR(2)  model  with  a  trough  spectrum  and  a  larger  sample  size.  The 
true  spectrum  is  displayed  in  Exhibit  13.15  on  page  337.  We  simulated  this  model  with 
n  =  200  and  unit-variance  normal  white  noise.  The  three  smoothed  spectral  estimates 
shown  are  based  on  spans  of  7,  15,  and  31.  As  before,  the  confidence  limits  and  band¬ 
width  guide  correspond  to  the  smallest  span  of  7  and  hence  give  the  narrowest  band¬ 
width  and  longest  confidence  intervals.  In  our  opinion,  the  middle  value  of  span  =  15, 
which  is  about  Jn  ,  gives  a  reasonable  estimate  of  the  spectrum. 
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Exhibit  14.13  Estimated  Spectrum  for  AR(2)  Trough  Spectrum  Model 


0.0  0.1  0.2  0.3  0.4  0.5 


Frequency 

>  Use  the  R  code  for  Exhibit  14.11  with  new  values  for  the 

>  parameters . 


Exhibit  14.14  shows  the  AR  spectral  density  estimate.  The  minimum  AIC  was 
achieved  at  the  true  order  of  the  underlying  model,  AR(2),  and  the  estimated  spectral 
density  is  quite  good. 


Exhibit  14.14  AR  Spectral  Estimation:  Estimated  (dotted),  True  (solid) 


>  Use  the  R  code  for  Exhibits  14.11  and  14.12  with  new  values 

>  for  the  parameters. 
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ARMA(1,1)  with  <)>  =  0.5,  0  =  0.8 

The  true  spectral  density  of  the  mixed  model  ARMA(1,1)  with  (j>  =  0-5  and  0  =  0.8  was 
shown  in  Exhibit  13.17  on  page  338.  This  model  has  substantial  medium-  and  high-fre¬ 
quency  content  but  very  little  power  at  low  frequencies.  We  simulated  this  model  with  a 
sample  size  of  n  =  500  and  unit- variance  normal  white  noise.  Using  Jn  ~  22  as  a  guide 
for  choosing  m,  we  show  three  estimates  with  m  of  1 1,  23,  and  45  in  Exhibit  14.15.  The 
confidence  interval  guide  indicates  that  the  many  peaks  produced  when  m  =  11  are 
likely  spurious  (which,  in  fact,  they  are).  With  such  a  smooth  underlying  spectrum,  the 
maximum  smoothing  shown  with  m  =  45  produces  a  rather  good  estimate. 


Exhibit  14.15  Spectral  Estimates  for  an  ARMA(1,1)  Process 


>  win . graph (width=4 . 875 , height =2 . 5 , pointsize=8 ) 

>  set . seed (324135) ;  n=500;  phi=.5;  theta= . 8 

>  y=arima . sim (model=list (ar=phi , ma= -theta) , n=n) 

>  spl  =  spec (y , spans  =  ll , sub= 1  1 , lty= 1  dotted '  , 

xlab= 1  Frequency ylab= ' Log (Estimated  Spectral  Density)') 

>  sp2=spec (y, spans=23 , plot=F) ;  sp3=spec (y, spans=45 , plot=F) 

>  lines (sp2$f req, sp2$spec , lty= 1  dashed ' ) 

>  lines (sp3$f req, sp3$spec , lty= 1 dotdash 1 ) 

>  f=seq (0 . 001, . 5, by= . 001) 

>  lines ( f , ARMAspec (model=list (ar=phi , ma=- theta) , f , 

plot=F) $spec, lty= ' solid' ) 
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In  this  case,  a  parametric  spectral  estimate  based  on  AR  models  does  not  work  well, 
as  shown  in  Exhibit  14.16.  The  software  selected  an  AR(3)  model,  but  the  resulting 
spectral  density  (dotted)  does  not  reproduce  the  true  density  (solid)  well  at  all. 


Exhibit  14.16  AR  Spectral  Estimate  for  an  ARMA(1,1)  Process 


>  sp4  =  spec (y, method= ' ar ' , lty= 1  dotted 1 , ylim=c (.15,1.9) , 

xlab= ' Frequency ylab= ' Log (Estimated  AR  Spectral  Density)') 

>  f =seq (0 . 001 , . 5 , by= . 001) 

>  lines ( f , ARMAspec (model=list (ar=phi , ma=-theta) , f , 

plot  =  F) $spec , lty= 1  solid 1 ) 


Seasonal  MA  with  0  =  0.4,  0  =  0.9,  and  s  =  12 

For  our  final  example  with  simulated  data,  we  choose  a  seasonal  process.  The  theoreti¬ 
cal  spectral  density  is  displayed  in  Exhibit  13.19  on  page  340.  We  simulated  n  =  144 
data  points  with  unit-variance  normal  white  noise.  We  may  think  of  this  as  12  years  of 
monthly  data.  We  used  modified  Daniell  spectral  windows  with  span  =  6,  12,  and  24 
based  on  *12. 

This  spectrum  contains  a  lot  of  detail  and  is  difficult  to  estimate  with  only  144 
observations.  The  narrowest  spectral  window  hints  at  the  seasonality,  but  the  two  other 
estimates  essentially  smooth  out  the  seasonality.  The  confidence  interval  widths  (corre¬ 
sponding  to  m  =  6)  do  seem  to  confirm  the  presence  of  real  seasonal  peaks. 
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Exhibit  14.17  Spectral  Estimates  for  a  Seasonal  Process 


Frequency 

>  win . graph (width=4 . 875 , height =2 . 5 , point size =8 ) 

>  set . seed (247135) ;  n=144;  theta= . 4 ; THETA= . 9 

>  y=arima . sim (model=list (ma=c ( -theta , rep (0,10) , -THETA, theta*THETA 

) ) , n=n) 

>  spl  =  spec (y, spans  =  7 , sub= 1  ' , lty= 1  dotted 1 , ylim=c (.15,9) , 

xlab= 1  Frequency ', ylab= ' Log (Estimated  Spectral  Density)1) 

>  sp2=spec (y, spans=13 , plot=F) ;  sp3=spec (y, spans=25 , plot=F) 

>  lines (sp2$f req, sp2$spec , lty= 1  dashed ' ) 

>  lines (sp3$f req, sp3$spec , lty= ' dotdash ' ) 

>  f=seq (0 . 001, . 5, by= . 001) 

>  lines ( f , ARMAspec (model=list (ma= -theta, seasonal =1 is t ( sma=- THETA, 

period=12) ) , f req=f , plot  =  F) $spec, lty= 1  solid' ) 


Exhibit  14.18  AR  Spectral  Estimates  for  a  Seasonal  Process 


>  sp4=spec (y, method= ' ar 1 , ylim=c (.15,15) , lty= ' dotted ' , 

xlab= 1  Frequency ', ylab= ' Log (Estimated  AR  Spectral  Density)  ') 
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>  f =seq (0 . 001 , . 5 , by= . 001) 

>  lines ( f , ARMAspec (model=list (ma=- theta , seasonal=list ( sma=- THETA, 

period=12) ) , f req=f , plot=F) $spec, lty= ' solid' ) 


Exhibit  14.18  shows  the  estimated  spectrum  based  on  the  best  AR  model.  An  order 
of  13  was  chosen  based  on  the  minimum  AIC,  and  the  seasonality  does  show  up  quite 
well.  However,  the  peaks  are  misplaced  at  the  higher  frequencies.  Perhaps  looking  at 
both  Exhibit  14.17  and  Exhibit  14.18  we  could  conclude  that  the  seasonality  is  real  and 
that  a  narrow  spectral  window  provides  the  best  estimate  of  the  underlying  spectral  den¬ 
sity  given  the  sample  size  available. 

As  a  final  estimate  of  the  spectrum,  we  use  a  convolution  of  two  modified  Daniell 
spectral  windows  each  with  span  =  3,  as  displayed  in  the  middle  of  Exhibit  14.3  on  page 
354.  The  estimated  spectrum  is  shown  in  Exhibit  14.19.  This  is  perhaps  the  best  of  the 
estimates  that  we  have  shown. 


Exhibit  14.19  Estimated  Seasonal  Spectrum  with  Convolution  Window 


>  sp5  =  spec (y, spans  =  c (3,3) , sub= 1  ' , lty= ' dotted'  , 

xlab= ' Frequency ', ylab= ' Log (Estimated  Spectral  Density)') 

>  f =seq (0 . 001 , . 5 , by= . 001) 

>  lines ( f , ARMAspec (model = list (ma=- theta , seasonal=list ( sma=- THETA, 

period=12) ) , f req=f , plot=F) $spec, lty=' solid' ) 


14.8  Examples  with  Actual  Data 


An  Industrial  Robot 

An  industrial  robot  was  put  through  a  sequence  of  maneuvers,  and  the  distance  from  a 
desired  target  end  position  was  recorded  in  inches.  This  was  repeated  324  times  to  form 
the  time  series  shown  in  Exhibit  14.20. 
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Time 

>  data (robot) 

>  plot (robot , ylab= ' End  Position  Of f set ' , xlab= 1  Time  1 ) 


Estimates  of  the  spectrum  are  displayed  in  Exhibit  14.21  using  the  convolution  of 
two  modified  Daniell  spectral  windows  with  m  =  7  (solid)  and  with  a  10%  taper  on  each 
end  of  the  series.  A  plot  of  this  spectral  window  is  shown  in  the  middle  of  Exhibit  14.3 
on  page  354.  The  spectrum  was  also  estimated  using  a  fitted  AR(7)  model  (dotted),  the 
order  of  which  was  chosen  to  minimize  the  AIC.  Given  the  length  of  the  95%  confi¬ 
dence  interval  shown,  we  can  conclude  that  the  peak  at  around  a  frequency  of  0.15  in 
both  estimates  is  probably  real,  but  those  shown  at  higher  frequencies  may  well  be  spu¬ 
rious.  There  is  a  lot  of  power  shown  at  very  low  frequencies,  and  this  agrees  with  the 
slowly  drifting  nature  of  the  series  that  may  be  seen  in  the  time  series  plot  in  Exhibit 
14.20. 
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>  spec (robot , spans  =  c (7,7) , taper= . 1 , sub= '  ' , xlab= 1  Frequency ' , 

ylab= ' Log (Spectrum) ') 

>  s=spec (robot , method= 1 ar ' , plot=F) 

>  lines (s$freq, s$spec, lty= ' dotted ' ) 


River  Flow 

Exhibit  14.22  shows  monthly  river  flow  for  the  Iowa  River  measured  at  Wapello,  Iowa, 
for  the  period  September  1958  through  August  2006.  The  data  are  quite  skewed  toward 
the  high  values,  but  this  was  greatly  improved  by  taking  logarithms  for  the  analysis. 


Exhibit  14.22  River  Flow  Time  Series 


Time 

>  data (flow);  plot ( flow, ylab= 1  River  Flow') 


The  sample  size  for  these  data  is  576  with  a  square  root  of  24.  The  bandwidth  of  a 
modified  Daniell  spectral  window  is  about  0.01.  After  some  experimentation  with  sev¬ 
eral  spectral  window  bandwidths,  we  decided  that  such  a  window  smoothed  too  much 
and  we  instead  used  a  convolution  of  two  such  windows,  each  with  span  =  7.  The  band¬ 
width  of  this  convolved  window  is  about  0.0044.  The  smoothed  spectral  density  esti¬ 
mate  is  shown  as  a  solid  curve  in  Exhibit  14.23  together  with  an  estimate  based  on  an 
AR(7)  model  (dotted)  chosen  to  minimize  the  AIC.  The  prominent  peak  at  frequency 
1/12  represents  the  strong  annual  seasonality.  There  are  smaller  secondary  peaks  at 
about  /*  0.17  and/*  0.25  that  correspond  to  multiples  of  the  fundamental  frequency  of 
1/12.  They  are  higher  harmonics  of  the  annual  frequency. 
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Exhibit  14.23  Log(Spectrum)  of  Log(Flow) 


>  spec (log (flow) , spans=c (7,7) , ylim=c (.02,13) , sub= ' 1 , 

ylab= 1  Log (Spectrum)  ' , xlab= 1  Frequency ' ) 

>  s=spec (log (flow) , method= ' ar ' , plot=F) 

>  lines (s$f req, s$spec , lty= ' dotted 1  ) 
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Monthly  Milk  Production 

The  top  portion  of  Exhibit  1 1.14  on  page  264,  showed  U.S.  monthly  milk  production 
from  January  1994  through  December  of  2005.  There  is  a  substantial  upward  trend 
together  with  seasonality.  We  first  remove  the  upward  trend  with  a  simple  linear  time 
trend  model  and  consider  the  residuals  from  that  regression — the  seasonals.  After  trying 
several  spectral  bandwidths,  we  decided  to  use  a  convolution  of  two  modified  Daniell 
windows,  each  with  span  =  3.  We  believe  that  otherwise  there  was  too  much  smoothing. 
This  was  confirmed  by  estimating  an  AR  spectrum  that  ended  up  fitting  an  AR  of  order 
15  with  peaks  at  the  same  frequencies.  Notice  that  the  peaks  shown  in  Exhibit  14.24  are 
located  at  frequencies  1/12,  2/12,...,  6/12,  with  the  peak  at  1/12  showing  the  most 
power. 


Exhibit  14.24  Estimated  Spectrum  for  Milk  Production  Seasonals 


o 

o 


Frequency 

>  data (milk) 

>  spec (milk, spans=c (3,3) , detrend=T, sub= 1 ' , 

ylab= ' Estimated  Log (Spectrum)  ', xlab= 1  Frequency ' ) 

>  abline (v=seq ( 1 : 6 ) /12 , lty= ' dotted ' ) 


For  a  final  example  in  this  section,  consider  the  time  series  shown  in  Exhibit  14.25. 
These  plots  display  the  first  400  points  of  two  time  series  of  lengths  4423  and  4417, 
respectively.  The  complete  series  were  created  by  recording  a  trombonist  and  a  eupho- 
niumist  each  sustaining  a  B  flat  (just  below  middle  C)  for  about  0.4  seconds.  The  origi¬ 
nal  recording  produced  data  sampled  at  44. 1  MHz,  but  this  was  reduced  by  subsampling 
every  fourth  data  point  for  the  analysis  shown.  Trombones  and  euphonia  are  both  brass 
wind  instruments  that  play  in  the  same  range,  but  they  have  different  sized  and  shaped 
tubing.  The  euphonium  has  larger  tubing  (a  larger  bore)  that  is  mostly  conical  in  shape, 
while  the  tenor  trombone  is  mostly  cylindrical  in  shape  and  has  a  smaller  bore.  The 
euphonium  sound  is  considered  more  mellow  than  the  bright,  brassy  sound  of  the  trom¬ 
bone.  When  one  listens  to  these  notes  being  played,  they  sound  rather  similar.  Our  ques- 


14.8  Examples  with  Actual  Data 


375 


tion  is:  Does  the  tubing  shape  and  size  affect  the  harmonics  (overtones)  enough  that  the 
differences  may  be  seen  in  the  spectra  of  these  sounds? 


Exhibit  14.25  Trombone  and  Euphonium  Playing  Bb 

Trombone  Bb 


E 

o 

Q 

s 


Time 


Euphonium  Bb 


Time 

>  win . graph (width=4 . 875 , height =4 , point size =8 ) 

>  data(tbone);  data(euph);  oldpar=par;  par (mfrow= (c (2 , 1) ) ) 

>  trombone= (tbone-mean (tbone) ) / sd (tbone) 

>  euphonium= (euph-mean (euph) ) /sd(euph) 

>  plot (window (trombone, end=400) , main= ' Trombone  Bb ' , 

ylab= ' Waveform ' , yaxp=c ( - 1 , +1 , 2 ) ) 

>  plot (window (euphonium, end=4 00 ), main= ' Euphonium  Bb ' , 

ylab= ' Waveform ' , yaxp=c ( - 1 , +1 , 2 ) ) ;  par=oldpar 


Exhibit  14.26  displays  the  estimated  spectra  for  the  two  waveforms.  The  solid  curve 
is  for  the  euphonium,  and  the  dotted  curve  is  for  the  trombone.  We  used  the  convolution 
of  two  modified  Daniell  spectral  windows,  each  with  span  =  11,  on  both  series.  Since 
both  series  are  essentially  the  same  length,  the  bandwidths  will  both  be  about  0.0009 
and  barely  perceptible  on  the  bandwidth/confidence  interval  crosshair  shown  on  the 
graph. 

The  first  four  major  peaks  occur  at  the  same  frequencies,  but  clearly  the  trombone 
has  much  more  spectral  power  at  distinct  higher  harmonic  frequencies.  It  is  suggested 
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that  this  may  account  for  the  more  brassy  nature  of  the  trombone  sound  as  opposed  to 
the  more  mellow  sound  of  the  euphonium. 


Exhibit  14.26  Spectra  for  Trombone  (dotted)  and  Euphonium  (solid) 


Frequency 

>  win . graph (width=4 . 875 , height =2 . 5 , pointsize=8 ) 

>  spec (euph, spans=c (11, 11) ,ylab= 'Log  Spectra', 

xlab= ' Frequency 1 , sub= 1  ' ) 

>  s=spec (tbone, spans=c (11,11) ,plot=F) 

>  lines (s$freq, s$spec, lty= 1  dotted' ) 


14.9  Other  Methods  of  Spectral  Estimation 


Prior  to  widespread  use  of  the  fast  Fourier  transform,  computing  and  smoothing  the 
sample  spectrum  was  extremely  intensive  computationally — especially  for  long  time 
series.  Lag  window  estimators  were  used  to  partially  mitigate  the  computational  diffi¬ 
culties. 


Lag  Window  Estimators 


Consider  the  sample  spectrum  and  smoothed  sample  spectrum.  We  have 


S(f) 


m 


X  w(k) 


j  =  -  n  +  1 


Y/ 


2  ni\f+-)j 


(14.9.1) 


n—  1 


;'  =  -»  + 1 


£  W(k)t 


,  .k. 

-2ni-j 


-2nifj 
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or 


where 


s(f)  =  "x  yjw{;)e-2nifj 

j=-n+ 1 


W(k) 


(14.9.2) 


(14.9.3) 


Equation  (14.9.2)  suggests  defining  and  investigating  a  class  of  spectral  estimators 
defined  as 

S(f)  =  £  vv0y7-cos(27i/7)  (14.9.4) 

j=-n+% 


where  the  function  w(x)  has  the  properties 
vv(.r)  =  w(-x) 

w(0)  =  1  (14.9.5) 

w(x)  <  1  for  | jc  <  1 

The  function  w(x)  is  called  a  lag  window  and  determines  how  much  weight  is  given  to 
the  sample  autocovariance  at  each  lag. 

The  rectangular  lag  window  is  defined  by 

w(x)  =  1  for  |jc|  <  1  (14.9.6) 

and  the  corresponding  lag  window  spectral  estimator  is  simply  the  sample  spectrum. 
This  estimator  clearly  gives  too  much  weight  to  large  lags  where  the  sample  autocovari¬ 
ances  are  based  on  too  few  data  points  and  are  unreliable. 

The  next  simplest  lag  window  is  the  truncated  rectangular  lag  window,  which  sim¬ 
ply  omits  large  lags  from  the  computation.  It  is  defined  as 


for  [/|  <  m 


(14.9.7) 


where  the  computational  advantage  is  achieved  by  choosing  m  much  smaller  than  n. 

The  triangular,  or  Bartlett,  lag  window  downweights  higher  lags  linearly  and  is 
defined  as 


for  [/|  <  m 


(14.9.8) 


Other  common  lag  windows  are  associated  with  the  names  of  Parzen,  Tukey-Ham- 
ming,  and  Tukey-Hanning.  We  will  not  pursue  these  further  here,  but  much  more  infor¬ 
mation  on  the  lag  window  approach  to  spectral  estimation  may  be  found  in  the  books  of 
Bloomfield  (2000),  Brillinger  (2001),  Brockwell  and  Davis  (1991),  and  Priestley 
(1981). 
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Other  Smoothing  Methods 

Other  methods  for  smoothing  the  sample  spectrum  have  been  proposed.  Kooperberg  et 
al.  (1995)  proposed  using  splines  to  estimate  the  spectral  distribution.  Fan  and 
Kreutzberger  (1998)  investigated  local  smoothing  polynomials  and  Whittle's  likelihood 
for  spectral  estimation.  This  approach  uses  automatic  bandwidth  selection  to  smooth  the 
sample  spectrum.  See  also  Yoshihide  (2006),  Jiang  and  Hui  (2004),  and  Fay  et  al. 
(2002). 

14.10  Summary 


Given  the  undesirable  characteristics  of  the  sample  spectral  density,  we  introduced  the 
smoothed  sample  spectral  density  and  showed  that  it  could  be  constructed  to  improve 
the  properties.  The  important  topics  of  bias,  variance,  leakage,  bandwidth,  and  tapering 
were  investigated.  A  procedure  for  forming  confidence  intervals  was  discussed,  and  all 
of  the  ideas  were  illustrated  with  both  real  and  simulated  time  series  data. 


Exercises 


14.1  Consider  the  variance  of  S  (f)  with  the  Daniell  spectral  window.  Instead  of  using 
Equation  (14.2.4)  on  page  355,  use  the  fact  that  2 S(f)/S(f)  has  approximately  a 
chi-square  distribution  with  two  degrees  of  freedom  to  show  that  the  smoothed 
sample  spectral  density  has  an  approximate  variance  of  S-(f)/(2ni  +  1) . 

14.2  Consider  various  convolutions  of  the  simple  Daniell  rectangular  spectral  window. 

(a)  Construct  a  panel  of  three  plots  similar  to  those  shown  in  Exhibit  14.3  on  page 
354  but  with  the  Daniell  spectral  window  and  with  m  =  5.  The  middle  graph 
should  be  the  convolution  of  two  Daniell  windows  and  the  leftmost  graph  the 
convolution  of  three  Daniell  windows. 

(b)  Evaluate  the  bandwidths  and  degrees  of  freedom  for  each  of  the  spectral  win¬ 
dows  constructed  in  part  (a).  Use  n  =100. 

(c)  Construct  another  panel  of  three  plots  similar  to  those  shown  in  Exhibit  14.3 
but  with  the  modified  Daniell  spectral  window.  This  time  use  m  =  5  for  the 
first  graph  and  convolve  two  with  m  =  5  and  m  =  7  for  the  second.  Convolve 
three  windows  with  m’ s  of  5,  7,  and  1 1  for  the  third  graph. 

(d)  Evaluate  the  bandwidths  and  degrees  of  freedom  for  each  of  the  spectral  win¬ 
dows  constructed  in  part  (c).  Use  n  =100. 
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14.3 


14.4 


14.5 


For  the  Daniell  rectangular  spectral  window  show  that 


I  m 

(a)  -  X  k2Wm(k) 

n  k  =  ~m 


2  ( m}  777  “ 

/i2(2;«  +  l)''  3  2 


(b)  Show  that  if  m  is  chosen  as  m  =  cjn  for  any  constant  c,  then  the  right-hand 
side  of  the  expression  in  part  (a)  tends  to  zero  as  n  goes  to  infinity. 

(c)  Show  that  if  m  =  cjn  for  any  constant  c,  then  the  approximate  variance  of  the 
smoothed  spectral  density  given  by  the  right-hand  side  of  Equation  (14.2.4)  on 
page  355  tends  to  zero  as  n  tends  to  infinity. 

Suppose  that  the  distribution  of  S  (f)  is  to  be  approximated  by  a  multiple  of  a 
chi-square  variable  with  degrees  of  freedom  v,  so  that  S(f)  «  Using  the 
approximate  variance  of  S  (/)  given  in  Equation  (14.2.4)  on  page  355  and  the  fact 
that  S  (f)  is  approximately  unbiased,  equate  means  and  variances  and  find  the 
values  for  c  and  v  (thus  establishing  Equation  (14.4.2)  on  page  356). 

Construct  a  time  series  of  length  n  =  48  according  to  the  expression 

Y{  =  sin[2ji(0.28)r] 


Display  the  periodogram  of  the  series  and  explain  its  appearance. 

14.6  Estimate  the  spectrum  of  the  Los  Angeles  annual  rainfall  time  series.  The  data  are 
in  the  file  named  larain.  Because  of  the  skewness  in  the  series,  use  the  logarithms 
of  the  raw  rainfall  values.  The  square  root  of  the  series  length  suggests  a  value  for 
the  span  of  about  1 1.  Use  the  modified  Daniell  spectral  window,  and  be  sure  to  set 
the  vertical  limits  of  the  plot  so  that  you  can  see  the  whole  confidence  interval 
guide.  Comment  on  the  estimated  spectrum. 

14.7  The  file  named  spotsl  contains  annual  sunspot  numbers  for  306  years  from  1700 
through  2005. 

(a)  Display  the  time  series  plot  of  these  data.  Does  stationarity  seem  reasonable 
for  this  series? 

(b)  Estimate  the  spectrum  using  a  modified  Daniell  spectral  window  convoluted 
with  itself  and  a  span  of  3  for  both.  Interpret  the  plot. 

(c)  Estimate  the  spectrum  using  an  AR  model  with  the  order  chosen  to  minimize 
the  AIC.  Interpret  the  plot.  What  order  was  selected? 

(d)  Overlay  the  estimates  obtained  in  parts  (b)  and  (c)  above  onto  one  plot.  Do 
they  agree  to  a  reasonable  degree? 

14.8  Consider  the  time  series  of  average  monthly  temperatures  in  Dubuque,  Iowa.  The 
data  are  in  the  file  named  tempdub  and  cover  from  January  1964  to  December 
1975  for  an  n  of  144. 

(a)  Estimate  the  spectrum  using  a  variety  of  span  values  for  the  modified  Daniell 
spectral  window. 

(b)  In  your  opinion,  which  of  the  estimates  in  part  (a)  best  represents  the  spectrum 
of  the  process?  Be  sure  to  use  bandwidth  considerations  and  confidence  limits 
to  back  up  your  argument. 
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14.9  An  EEG  (electroencephalogram)  time  series  is  given  in  the  data  file  named  eeg. 
An  electroencephalogram  is  a  noninvasive  test  used  to  detect  and  record  the  elec¬ 
trical  activity  generated  in  the  brain.  These  data  were  measured  at  a  sampling  rate 
of  256  per  second  and  came  from  a  patient  suffering  a  seizure.  The  total  record 
length  is  n  =  13,000 — or  slightly  less  than  one  minute. 

(a)  Display  the  time  series  plot  and  decide  if  stationarity  seems  reasonable. 

(b)  Estimate  the  spectrum  using  a  modified  Daniell  spectral  window  convolved 
with  itself  and  a  span  of  5 1  for  both  components  of  the  convolution.  Interpret 
the  plot. 

(c)  Estimate  the  spectrum  using  an  AR  model  with  the  order  chosen  to  minimize 
the  A1C.  Interpret  the  plot.  What  order  was  selected? 

(d)  Overlay  the  estimates  obtained  in  parts  (b)  and  (c)  above  onto  one  plot.  Do 
they  agree  to  a  reasonable  degree? 

14.10  The  file  named  electricity  contains  monthly  U.  S.  electricity  production  values 
from  January  1994  to  December  2005.  A  time  series  plot  of  the  logarithms  of 
these  values  is  shown  in  Exhibit  11.14  on  page  264.  Since  there  is  an  upward 
trend  and  increasing  variability  at  higher  levels  in  these  data,  use  the  first  differ¬ 
ence  of  the  logarithms  for  the  remaining  analysis. 

(a)  Construct  a  time  series  plot  of  the  first  difference  of  the  logarithms  of  the  elec¬ 
tricity  values.  Does  a  stationary  model  seem  warranted  at  this  point? 

(b)  Display  the  smoothed  spectrum  of  the  first  difference  of  the  logarithms  using 
a  modified  Daniell  spectral  window  and  span  values  of  25,  13,  and  7.  Interpret 
the  results. 

(c)  Now  use  a  spectral  window  that  is  a  convolution  of  two  modified  Daniell  win¬ 
dows  each  with  span  =  3.  Also  use  a  10%  taper.  Interpret  the  results. 

(d)  Estimate  the  spectrum  using  an  AR  model  with  the  order  chosen  to  minimize 
the  AIC.  Interpret  the  plot.  What  order  was  selected? 

(e)  Overlay  the  estimates  obtained  in  parts  (c)  and  (d)  above  onto  one  plot.  Do 
they  agree  to  a  reasonable  degree? 

14.11  Consider  the  monthly  milk  production  time  series  used  in  Exhibit  14.24  on  page 
374.  The  data  are  in  the  file  named  milk. 

(a)  Estimate  the  spectrum  using  a  spectral  window  that  is  a  convolution  of  two 
modified  Daniell  windows  each  with  span  =  7.  Compare  these  results  with 
those  shown  in  Exhibit  14.24. 

(b)  Estimate  the  spectrum  using  a  single  modified  Daniell  spectral  window  with 
span  =  7.  Compare  these  results  with  those  shown  in  Exhibit  14.24  and  those 
in  part  (a). 

(c)  Finally,  estimate  the  spectrum  using  a  single  modified  Daniell  spectral  win¬ 
dow  with  span  =11.  Compare  these  results  with  those  shown  in  Exhibit  14.24 
and  those  in  parts  (a)  and  (b). 

(d)  Among  the  four  different  estimates  considered  here,  which  do  you  prefer  and 
why? 
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14.12  Consider  the  river  flow  series  displayed  in  Exhibit  14.22  on  page  372.  An  esti¬ 
mate  of  the  spectrum  is  shown  in  Exhibit  14.23  on  page  373.  The  data  are  in  the 
file  named  flow. 

(a)  Here  n  =  576  and  ,,fn  =  24.  Estimate  the  spectrum  using  span  =  25  with  the 
modified  Daniell  spectral  window.  Compare  your  results  with  those  shown  in 
Exhibit  14.23. 

(b)  Estimate  the  spectrum  using  span  =13  with  the  modified  Daniell  spectral 
window  and  compare  your  results  to  those  obtained  in  part  (a)  and  in  Exhibit 
14.23. 

14.13  The  time  series  in  the  file  named  tuba  contains  about  0.4  seconds  of  digitized 
sound  from  a  tuba  playing  a  B  flat  one  octave  and  one  note  below  middle  C. 

(a)  Display  a  time  series  plot  of  the  first  400  of  these  data  and  compare  your 
results  with  those  shown  in  Exhibit  14.25  on  page  375,  for  the  trombone  and 
euphonium. 

(b)  Estimate  the  spectrum  of  the  tuba  time  series  using  a  convolution  of  two  mod¬ 
ified  Daniell  spectral  windows,  each  with  span  =  11. 

(c)  Compare  the  estimated  spectrum  obtained  in  part  (b)  with  those  of  the  trom¬ 
bone  and  euphonium  shown  in  Exhibit  14.26  on  page  376.  (You  may  want  to 
overlay  several  of  these  spectra.)  Remember  that  the  tuba  is  playing  one 
octave  lower  than  the  two  other  instruments. 

(d)  Do  the  higher-frequency  components  of  the  spectrum  for  the  tuba  look  more 
like  those  of  the  trombone  or  those  of  the  euphonium?  (Hint:  The  euphonium 
is  sometimes  called  a  tenor  tuba!) 
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Suppose  Y f  =  cos(2it/Qf  +  ®)  for  t  =  1,  2,...,  n,  where  f0  is  not  necessarily  a  Fourier 
frequency.  Since  it  will  not  affect  the  periodogram,  we  will  actually  suppose  that 

Y f  =  elKifot  (14.K.1) 

in  order  to  simplify  the  mathematics.  Then  the  discrete-time  Fourier  transform  of  this 
sequence  is  given  by 


I  y  y  2nift  _  I  2jti(/Q  -/)? 
flV'  n  A 

i  =  1  t  =  1 

By  Equations  (13.J.7)  and  (13. J. 8)  on  page  350,  for  any  z, 


(14. K. 2) 


I  f  e2nizt  -  le2nizi£^zl) 
j  n  (e2niz-l) 


=  1  cKi<n+uAeKinz-e-™z) 
n  (eniz  -  e~niz) 
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so  that 


The  function 


I  ■y  £2nizt—  eni(n+l)z  1  sin(7t«z) 
n  4^!  L«  sin(7tz)  _ 


£>(z) 


1  sin  ( Tin  z) 
n  sin  (its) 


(14.K.3) 


(14.K.4) 


is  the  Dirichlet  kernel  shown  on  the  left-hand  side  of  Exhibit  14.7  on  page  360  for  n  = 
100.  These  results  lead  to  the  following  relationship  for  the  periodogram  of  Yt : 


(14.K.5) 


Remember  that  for  all  Fourier  frequencies  D{f)  =  0,  so  that  this  window  has  no  effect  at 
those  frequencies.  Leakage  occurs  when  there  is  substantial  power  at  non-Fourier  fre¬ 
quencies.  Now  consider  tapering  Yt  with  a  cosine  bell.  We  have 


Y,  =  -<  I  -  cos 


27t(r-0.5)' 

n 


_  1  2nif0t  1  2nifQt +  2ni(t -Vi)/n  1  2nifQt-2ni(t-Vi)/n 


(14.K.6) 


2  4 

and  after  some  more  algebra  we  obtain 

1 


r 


jr  Yte~lnifr 


t  =  1 


(14.K.7) 


ni{n  +  l)(fQ-f)t 

=  e  u 


n  .  i 


\\J-Jo--n)+-2uv-J0)  +  -Auyj-h  n 


The  function 


D(f)  =  +  i)  (14.K.8) 

is  the  tapered  or  modified  Dirichlet  kernel  that  is  plotted  on  the  right-hand  side  of 
Exhibit  14.7  on  page  360  for  n  =  100.  The  periodogram  of  the  tapered  series  is  propor¬ 
tional  to  | (D(/))p  ,  and  the  side  lobe  problem  is  substantially  mitigated. 


Chapter  15 

Threshold  Models 


It  can  be  shown  (Wold,  1948)  that  any  weakly  stationary  process  {  Yt }  admits  the  Wold 
decomposition 

Yt  =  Ut  +  et  +  ylet_l+y2et_  2+- 

where  et  equals  the  deviation  of  Yt  from  the  best  linear  predictor  based  on  all  past  Y  val¬ 
ues,  and  { Ut }  is  a  purely  deterministic  stationary  process,  with  e,  being  uncorrelated 
with  Us,  for  any  t  and  s.  A  purely  deterministic  process  is  a  process  that  can  be  pre¬ 
dicted  to  arbitrary  accuracy;  (that  is,  with  arbitrarily  small  mean  squared  error)  by  some 
linear  predictors  of  finitely  many  past  lags  of  the  process.  A  simple  example  of  a  purely 
deterministic  process  is  Ut  =  |i,  a  constant.  A  more  subtle  example  is  the  random  cosine 
wave  model  introduced  on  page  18.  In  essence,  {Ut}  represents  the  stochastic,  station¬ 
ary  “trend”  in  the  data.  The  prediction  errors  {et}  are  a  white  noise  sequence,  and  et  rep¬ 
resents  the  “new”  component  making  up  Yt  and  hence  is  often  called  the  innovation  of 
the  process.  The  Wold  decomposition  then  states  that  any  weakly  stationary  process  is 
the  sum  of  a  (possibly  infinite-order)  MA  process  and  a  deterministic  trend.  Thus,  we 
can  compute  the  best  linear  predictor  within  the  framework  of  MA(oo)  processes  that 
can  further  be  approximated  by  finite-order  ARMA  processes.  The  Wold  decomposition 
thus  guarantees  the  versatility  of  the  ARMA  models  in  prediction  with  stationary  pro¬ 
cesses. 

However,  except  for  convenience,  there  is  no  reason  for  restricting  to  linear  predic¬ 
tors.  If  we  allow  nonlinear  predictors  and  seek  the  best  predictor  of  Yt  based  on  past  val¬ 
ues  of  Y  that  minimizes  the  mean  squared  prediction  error,  then  the  best  predictor  need 
no  longer  be  the  best  linear  predictor.  The  solution  is  simply  the  conditional  mean  of  Yt 
given  all  past  Y  values.  The  Wold  decomposition  makes  it  clear  that  the  best  one-step- 
ahead  linear  predictor  is  the  best  one-step-ahead  predictor  if  and  only  if  {et}  in  the  Wold 
decomposition  satisfies  the  condition  that  the  conditional  mean  of  et  given  past  e ’s  is 
identically  equal  to  0.  The  {et}  satisfying  the  latter  condition  is  called  a  sequence  of 
martingale  differences,  so  the  condition  will  be  referred  to  as  the  martingale  difference 
condition.  The  martingale  difference  condition  holds  if,  for  example,  {et}  is  a  sequence 
of  independent,  identically  distributed  random  variables  with  zero  mean.  But  it  also 
holds  if  { et }  is  some  GARCH  process.  Nonetheless,  when  the  martingale  difference 
condition  fails,  nonlinear  prediction  will  lead  to  a  more  accurate  prediction.  Hannan 
(1973)  defines  a  linear  process  to  be  one  where  the  best  one-step-ahead  linear  predictor 
is  the  best  one-step-ahead  predictor. 
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The  time  series  models  discussed  so  far  are  essentially  linear  models  in  the  sense 
that,  after  suitable  instantaneous  transformation,  the  one-step-ahead  conditional  mean  is 
a  linear  function  of  the  current  and  past  values  of  the  time  series  variable.  If  the  errors 
are  normally  distributed,  as  is  commonly  assumed,  a  linear  ARIMA  model  results  in  a 
normally  distributed  process.  Linear  time  series  methods  have  proved  to  be  very  useful 
in  practice.  However,  linear,  normal  processes  do  suffer  from  some  limitations.  For 
example,  a  stationary  normal  process  is  completely  characterized  by  its  mean  and  auto¬ 
covariance  function;  hence  the  process  reversed  in  time  has  the  same  distribution  as  the 
original  process.  The  latter  property  is  known  as  time  reversibility.  Yet,  many  real  pro¬ 
cesses  appear  to  be  time-irreversible.  For  example,  the  historical  daily  closing  price  of  a 
stock  generally  rose  gradually  but,  if  it  crashed,  it  did  so  precipitously,  signifying  a 
time-irreversible  data  mechanism.  Moreover,  the  one-step-ahead  conditional  mean  may 
be  nonlinear  rather  than  linear  in  the  current  and  past  values.  For  example,  animal  abun¬ 
dance  processes  may  be  nonlinear  due  to  finite-resource  constraints.  Specifically,  while 
moderately  high  abundance  in  one  period  is  likely  to  be  followed  by  higher  abundance 
in  the  next  period,  extremely  high  abundance  may  lead  to  a  population  crash  in  the  ensu¬ 
ing  periods.  Nonlinear  time  series  models  generally  display  rich  dynamical  structure. 
Indeed,  May  (1976)  showed  that  a  very  simple  nonlinear  deterministic  difference  equa¬ 
tion  may  admit  chaotic  solutions  in  the  sense  that  its  time  series  solutions  are  sensitive 
to  the  initial  values,  which  may  appear  to  be  indistinguishable  from  a  white  noise 
sequence  based  on  correlation  analysis.  Nonlinear  time  series  analysis  thus  may  provide 
more  accurate  predictions,  which  can  be  very  substantial  in  certain  parts  of  the  state 
space,  and  shed  novel  insights  on  the  underlying  dynamics  of  the  data.  Nonlinear  time 
series  analysis  was  earnestly  initiated  around  the  late  1970s,  prompted  by  the  need  for 
modeling  the  nonlinear  dynamics  shown  by  real  data;  see  Tong  (2007).  Except  for  cases 
with  well-developed  theory  accounting  for  the  underlying  mechanism  of  an  observed 
time  series,  the  nonlinear  data  mechanism  is  generally  unknown.  Thus,  a  fundamental 
problem  of  empirical  nonlinear  time  series  analysis  concerns  the  choice  of  a  general 
nonlinear  class  of  models.  Here,  our  goal  is  rather  modest  in  that  we  introduce  the 
threshold  model,  which  is  one  of  the  most  important  classes  of  nonlinear  time  series 
models.  For  a  systematic  account  of  nonlinear  time  series  analysis  and  chaos,  see  Tong 
(1990)  and  Chan  and  Tong  (2001). 

15.1  Graphically  Exploring  Nonlinearity 


In  ARIMA  modeling,  the  innovation  (error)  process  is  often  specified  as  independent 
and  identically  normally  distributed.  The  normal  error  assumption  implies  that  the  sta¬ 
tionary  time  series  is  also  a  normal  process;  that  is,  any  finite  set  of  time  series  observa¬ 
tions  are  jointly  normal.  For  example,  the  pair  (Tj,F2)  has  a  bivariate  normal 
distribution  and  so  does  any  pair  of  F s;  the  triple  (Ij,  Y2,  F3)  has  a  trivariate  normal  dis¬ 
tribution  and  so  does  any  triple  of  F s,  and  so  forth.  When  data  are  nonnormal,  instanta¬ 
neous  transformation  of  the  form  h(Yt),  for  example,  h(Y t)  =  JYt,  may  be  applied  to 
the  data  in  the  hope  that  a  normal  ARIMA  model  can  serve  as  a  good  approximation  to 
the  underlying  data-generating  mechanism.  The  normality  assumption  is  mainly 
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adopted  for  convenience  in  statistical  inference.  In  practice,  an  ARIMA  model  with 
nonnormal  innovations  may  be  entertained.  Indeed,  such  processes  have  very  rich  and 
sometimes  exotic  dynamics;  see  Tong  (1990).  If  the  normal  error  assumption  is  main¬ 
tained,  then  a  nonlinear  time  series  is  generally  not  normally  distributed.  Nonlinearity 
may  then  be  explored  by  checking  whether  or  not  a  finite  set  of  time  series  observations 
are  jointly  normal;  for  example,  whether  or  not  the  two-dimensional  distribution  of  pairs 
of  Fs  is  normal.  This  can  be  checked  by  plotting  the  scatter  diagram  of  Yt  against  Yt_  j 
or  Yt_  2,  and  so  forth.  For  a  bivariate  normal  distribution,  the  scatter  diagram  should 
resemble  an  elliptical  data  cloud  with  decreasing  density  from  its  center.  Departure  from 
such  a  pattern  (for  example,  existence  of  a  large  hole  in  the  data  cloud)  may  signify  that 
the  data  are  nonnormal  and  the  underlying  process  may  be  nonlinear. 

Exhibit  15.1  shows  the  scatter  diagrams  of  Yt  versus  its  lag  1  to  lag  6,  where  we 
simulated  data  from  the  ARIMA(2, 1 )  model 

Yf=  1.6yf_j  -  0.94Ff_2  +  ef- 0.64ef_  j  (15.1.1) 

with  the  innovations  being  standard  normal.  Note  that  the  data  clouds  in  the  scatter  dia¬ 
grams  are  roughly  elliptically  shaped. 

To  help  us  visualize  the  relationship  between  the  response  and  its  lags,  we  draw  fit¬ 
ted  nonparametric  regression  lines  on  each  scatter  diagram.  For  example,  on  the  scatter 
diagram  of  Yt  against  Yt_\,  a  nonparametric  estimate  of  the  conditional  mean  function 
of  Y,  given  Yt  _  j,  also  referred  to  as  the  lag  1  regression  function,  is  superimposed.  (Spe¬ 
cifically,  the  lag  1  regression  function  equals  mj(y)  =  E(Yt\Yt_  j=y)  as  a  function  of  y.)  If 
the  underlying  process  is  linear  and  normal,  the  true  lag  1  regression  function  must  be 
linear  and  so  we  expect  the  nonparametric  estimate  of  it  to  be  close  to  a  straight  line.  On 
the  other  hand,  a  curved  lag  1  regression  estimate  may  suggest  that  the  underlying  pro¬ 
cess  is  nonlinear.  Similarly,  one  can  explore  the  lag  2  regression  function  (that  is,  the 
conditional  mean  of  Y,  given  K,  _  2  =  >’)  as  a  function  of  y  and  higher-lag  analogues.  In 
the  case  of  strong  departure  from  linearity,  the  shape  of  these  regression  functions  may 
provide  some  clue  as  to  what  nonlinear  model  may  be  appropriate  for  the  data.  Note  that 
all  lagged  regression  curves  in  Exhibit  15.1  are  fairly  straight,  suggesting  that  the  under¬ 
lying  process  is  linear,  which  indeed  we  know  is  the  case. 
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Exhibit  15.1  Lagged  Regression  Plots  for  a  Simulated  ARMA(2,1) 
Process.  Solid  lines  are  fitted  regression  curves. 

lag  1  regression  plot  lag  2  regression  plot 


>  win . graph (width=4 . 875 ,  height=6 . 5 , pointsize=8 ) 

>  set . seed (2534567)  ;  par (mf row=c (3 , 2  )  ) 

>  y=arima . sim (n=61 , model  =  list (ar=c(1.6, -0.94)  ,ma=-0.64)  ) 

>  lagplot (y) 


We  now  illustrate  the  technique  of  a  lagged  regression  plot  with  a  real  example. 
Exhibit  15.2  plots  an  experimental  time  series  response  as  the  number  of  individuals 
( Didinium  natsutum,  a  protozoan)  per  ml  measured  every  twelve  hours  over  a  period  of 
35  days;  see  Veilleux  (1976)  and  Jost  and  Ellner  (2000).  The  experiment  studied  the 
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population  fluctuation  of  a  prey-predator  system;  the  prey  is  Paramecium  aurelia,  a  uni¬ 
cellular  ciliate  protozon,  whereas  the  predator  species  is  Didinium  natsutum.  The  initial 
part  of  the  data  appears  to  be  nonstationary  owing  to  transient  effects.  It  can  be  seen  that 
the  increasing  phase  of  the  series  is  generally  longer  than  that  of  the  decreasing  phase, 
suggesting  that  the  time  series  is  time-irreversible.  Below,  we  shall  omit  the  first  14  data 
points  from  the  analysis;  that  is,  only  the  (log-transformed)  data  corresponding  to  the 
solid  curve  in  Exhibit  15.2  are  used  in  subsequent  analysis. 


Exhibit  15.2  Logarithmically  Transformed  Number  of  Predators.  The 

stationary  part  of  the  time  series  is  displayed  as  a  solid  line. 
Solid  circles  indicate  data  in  the  lower  regime  of  a  fitted 
threshold  autoregressive  model. 


>  data (veilleux) ;  predator=veilleux [ , 1] 

>  win . graph (width=4 . 875 , height  =  2 . 5 , point size  =  8 ) 

>  plot ( log (predator) , lty=2 , type= 'o', xlab= ' Day ' , 

ylab= ' Log (predator) ' ) 

>  predator . eq= window (predator, start=c (7,1) ) 

>  lines ( log (predator . eq)  ) 

>  indexl  =  zlag (log (predator .eq)  ,  3 )  <  =  4 . 661 

>  points (y=log (predator . eq) [indexl] , (time (predator . eq) ) [indexl] , 

pch=19) 


Exhibit  15.3  shows  the  lagged  regression  plots  of  the  predator  series.  Notice  that 
several  scatter  diagrams  have  a  large  hole  in  the  center,  hinting  that  the  data  need  to  be 
nonnormal.  Also,  the  regression  function  estimates  appear  to  be  strongly  nonlinear  for 
lags  2  to  4,  suggesting  a  nonlinear  data  mechanism;  in  fact,  the  histogram  (not  shown) 
suggests  that  the  series  is  bimodal. 
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Exhibit  15.3  Lagged  Regression  Plots  for  the  Predator  Series 


lag  5  regression  plot  lag  6  regression  plot 


>  win . graph (width=4 . 875 , height =6 . 5 , pointsize=8 ) 

>  data (predator . eq) 

>  lagplot ( log (predator . eq) )  #  libraries  mgcv  and  locfit  required 


We  now  elaborate  on  how  the  regression  curves  are  estimated  nonparametrically. 
Readers  not  interested  in  the  technical  details  may  skip  to  the  next  section.  For  concrete¬ 
ness,  suppose  we  want  to  estimate  the  lag  1  regression  function.  (The  extension  to  other 
lags  is  straightforward.)  Nonparametric  estimation  of  the  lag  1  regression  function  gen- 
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erally  makes  use  of  the  idea  of  estimating  the  conditional  mean  m | (y)  =  E(Yt\Yt_  \  =  y) 
by  averaging  those  F s  whose  lag  1  values  are  close  to  y.  Clearly,  the  averaging  may  be 
rendered  more  accurate  by  giving  more  weight  to  those  Fs  whose  lag  1  value  is  closer 
to  y.  The  weights  are  usually  assigned  systematically  via  some  probability  density  func¬ 
tion  k(y)  and  a  bandwidth  parameter  h  >  0.  The  data  pair  (Yt,Yt_  j)  is  assigned  the 
weight 


(15.1.2) 


Hereafter  we  assume  that  k(-)  is  the  standard  normal  probability  density  function.  Note 
that  then  the  right-hand  side  of  Equation  (15.1.2)  is  the  normal  probability  density  func¬ 
tion  with  mean  y  and  variance  h~.  Finally,  we  define  the  Nadaray a- Watson  estimator^ 


n 

I  wtYt 

°V)=  -  (15.1.3) 

Z  wr 

r  =  2 

(The  meaning  of  the  superscript  0  will  become  clear  later  on.)  Since  the  normal  proba¬ 
bility  density  function  is  negligible  for  values  that  differ  from  the  mean  by  more  than 
three  standard  deviations,  the  Nadaray  a- Watson  estimator  essentially  averages  the  Yt 
whose  Yt_  |  is  within  3 li  units  from  y,  and  the  averaging  is  weighted  with  more  weight 
to  those  observations  whose  lag  1  values  are  closer  to  y.  The  use  of  the  Nadaraya- Wat¬ 
son  estimator  of  the  lag  1  regression  function  requires  us  to  specify  the  bandwidth. 
There  are  several  methods,  including  cross-validation  for  determining  h.  However,  for 
an  exploratory  analysis,  we  can  always  use  some  default  bandwidth  value  and  vary  it  a 
bit  to  get  some  feel  of  the  shape  of  the  lag  1  regression  function. 

A  more  efficient  nonparametric  estimator  may  be  obtained  by  assuming  that  the 
underlying  regression  function  can  be  well-approximated  locally  by  a  linear  function; 
see  Fan  and  Gijbels  (1996).  The  local  linear  estimator  of  the  lag  1  regression  function  at 
y  equals  fti^\y)  =  b0,  which  is  obtained  by  minimizing  the  local  weighted  residual 
sum  of  squares: 

t  w+Yt-bO-blYt-02  (15.1.4) 

/  =  2 

The  reader  may  now  guess  that  the  superscript  k  in  the  notation  nif\y)  refers  to 
the  degree  of  the  local  polynomial.  Often,  data  are  unevenly  spaced,  in  which  case  a  sin¬ 
gle  bandwidth  may  not  work  well.  Instead,  a  variable  bandwidth  tied  to  the  density  of 
the  data  may  be  more  efficient.  A  simple  scheme  is  the  nearest-neighbor  scheme  that 
varies  the  window  width  so  that  it  covers  a  fixed  fraction  of  data  nearest  to  the  center  of 
the  window.  We  set  the  fraction  to  be  70%  for  all  our  reported  lagged  regression  plots. 


t  See  Nadaraya  (1964)  and  Watson  (1964). 
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It  is  important  to  remember  that  the  local  polynomial  approach  assumes  that  the 
true  lag  1  regression  function  is  a  smooth  function.  If  the  true  lag  1  regression  function 
is  discontinuous,  then  the  local  polynomial  approach  may  yield  misleading  estimates. 
However,  a  sharp  turn  in  the  estimated  regression  function  may  serve  as  a  warning  that 
the  smoothness  condition  may  not  hold  for  the  true  lag  1  regression  function. 

15.2  Tests  for  Nonlinearity 

Several  tests  have  been  proposed  for  assessing  the  need  for  nonlinear  modeling  in  time 
series  analysis.  Some  of  these  tests,  such  as  those  studied  by  Keenan  (1985),  Tsay 
(1986),  and  Luukkonen  et  al.  (1988),  can  be  interpreted  as  Lagrange  multiplier  tests  for 
specific  nonlinear  alternatives. 

Keenan  (1985)  derived  a  test  for  nonlinearity  analogous  to  Tukey’s  one  degree  of 
freedom  for  nonadditivity  test  (see  Tukey,  1949).  Keenan’s  test  is  motivated  by  approxi¬ 
mating  a  nonlinear  stationary  time  series  by  a  second-order  Volterra  expansion  (Wiener, 
1958) 

00  00  00 

Y,  =  V+  X  Vt-u+  Z  Z  0nvef-|re(-v  (15-2'1) 

(_t  =  —oo  V  =  —oo  =  —oo 

where  {e,,  -oo  <  t  <  oo}  is  a  sequence  of  independent  and  identically  distributed 
zero-mean  random  variables.  The  process  {Yt}  is  linear  if  the  double  sum  on  the  right- 
hand  side  of  (15.2.1)  vanishes.  Thus,  we  can  test  the  linearity  of  the  time  series  by  test¬ 
ing  whether  or  not  the  double  sum  vanishes.  In  practice,  the  infinite  series  expansion  has 
to  be  truncated  to  a  finite  sum.  Let  Y^,...,Yn  denote  the  observations.  Keenan’s  test  can 
be  implemented  as  follows: 

(i)  Regress  Yt  on  Yt_  Yt_m,  including  an  intercept  term,  where  m  is  some  pre- 

specified  positive  integer;  calculate  the  fitted  values  {  Y. }  and  the  residuals  {e,}, 

for  t  =  m+  I . n:  and  set  RSS  =  Ye}  ,  the  residual  sum  of  squares. 

A  2 

(ii)  Regress  Y .  on  Y,_  i ,...,Yt_m,  including  an  intercept  term,  and  calculate  the 

A  1 

residuals  {^,}  for  t  =  m+  1,...,  n. 

'  A  A 

(iii)  Regress  e.  on  the  residuals  c  without  an  intercept  for  t  =  m  +  1,-..,  n,  and 

'  1  A 

Keenan’s  test  statistic,  denoted  by  F  ,  is  obtained  by  multiplying  (n  -  2m  -  2)1 
(n  —  m—  1)  to  the  /'-statistic  for  testing  that  the  last  regression  function  is  identi¬ 
cally  zero.  Specifically,  let 

11  =  11 0  Z  S  as.2.2) 

^  t  =  m  +  1 

where  r|0  is  the  regression  coefficient.  Form  the  test  statistic 

p  _  r|~(n  -  2m  -  2) 

RSS-rp 


(15.2.3) 
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Under  the  null  hypothesis  of  linearity,  the  test  statistic  F  is  approximately  distrib¬ 
uted  as  an  /-"-distribution  with  degrees  of  freedom  1  and  n  -  2m  -  2. 

Keenan’s  test  can  be  derived  heuristically  as  follows.  Consider  the  following  model. 


Yt  =  60  +  Syt-l  +  •••  +^myf-m  +  eXP' 


Z  Wt-J 

\J  = 1  J 


>  +  st  (15.2.4) 


where  {sr}  are  independent  and  normally  distributed  with  zero  mean  and  finite  vari¬ 
ance.  If  ti  =  0,  the  exponential  term  becomes  1  and  can  be  absorbed  into  the  intercept 
term  so  that  the  preceding  model  becomes  an  AR(m)  model.  On  the  other  hand,  for  non¬ 
zero  p,  the  preceding  model  is  nonlinear.  Using  the  expansion  exp(x)  «  1  +  x,  which 
holds  forx  of  small  magnitude,  it  can  be  seen  that,  for  small  p,  Yt  follows  approximately 
a  quadratic  AR  model: 


Yt  =  0(»+  1  +<J,|  K,_, 


+  •••  +<bmYt-m  +  ^ 


z 

V=1 


t-j 


+  s. 


(15.2.5) 


This  is  a  restricted  linear  model  in  that  the  last  covariate  is  the  square  of  the  linear  term 
4q  Yt_i  H - 1-  which  is  replaced  by  the  fitted  values  Y.  under  the  null  hypothe¬ 

sis.  Keenan’s  test  is  equivalent  to  testing  p  =  0  in  the  multiple  regression  model  (with 
the  constant  1  being  absorbed  into  9q): 

Yt  =  %  +  ^Yt-l  +  -+^nYt-m  +  ^  +  ^t  (15.2.6) 


which  can  be  carried  out  in  the  manner  described  in  the  beginning  of  this  section.  Note 
that  the  fitted  values  are  only  available  for  n>t>m+  1.  Keenan’s  test  is  the  same  as  the 
T-test  for  testing  whether  or  not  p  =  0.  A  more  formal  approach  is  facilitated  by  the 
Lagrange  multiplier  test;  see  Tong  (1990). 

Keenan’s  test  is  both  conceptually  and  computationally  simple  and  only  has  one 
degree  of  freedom,  which  makes  the  test  very  useful  for  small  samples.  However, 
Keenan’s  test  is  powerful  only  for  detecting  nonlinearity  in  the  form  of  the  square  of  the 
approximating  linear  conditional  mean  function.  Tsay  (1986)  extended  Keenan’s 
approach  by  considering  more  general  nonlinear  alternatives.  A  more  general  alternative 
to  nonlinearity  may  be  formulated  by  replacing  the  term 


exp 


b2 


Z  t-j 

7  =  1 


by 


exp (5 !  iYt_i  +  §\'2Yt-\Yt-2  +  +  ^l,mYt-lYt-m 

+  82,2}7-2  +  52,3Fr-2yf-3+  +  S2,  mYt-2Yt-m  + 

2  2 
8m  — l,m— It  —  m+1  8 m  —  1,  mYt  —  m  +  \  Y t  —  m  8m,  m  t  —  m) 


(15.2.7) 


(15.2.8) 
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Using  the  approximation  exp(x)  «  1  +  x,  we  see  that  the  nonlinear  model  is  approxi¬ 
mately  a  quadratic  AR  model.  But  the  coefficients  of  the  quadratic  terms  are  now 
unconstrained.  Tsay’s  test  is  equivalent  to  considering  the  following  quadratic  regres¬ 
sion  model: 


Tr  =  0o  +  W-1  +  -+W-ffl 

+  ^1.  \Yt-  1  +  l^r-2  +  " 

+  ^2,  2  ^7-  2  ^2,  3Yt-2Yt-3  +  ' 

”  +62,mYt-2Yt-m 

—  1,  m  -  1  t  - 


m  +  1  - 


Y 

,mt- 


Y 

m  +  1  t 


S  Y2 
-m  '  °m,  m1 1  - 


m  +  St 


(15.2.9) 


and  testing  whether  or  not  all  the  m(m  +  l)/2  coefficients  5(  /-  are  zero.  Again,  this  can  be 
carried  out  by  an  f-test  that  all  5,-  ■  s  are  zero  in  the  preceding  equation.  For  a  rigorous 
derivation  of  Tsay’s  test  as  a  Lagrange  multiplier  test,  see  Tong  (1990). 

We  now  illustrate  these  tests  with  two  real  datasets.  In  the  first  application,  we  use 
the  annual  American  (relative)  sunspot  numbers  collected  from  1945  to  2007.  The 
annual  (relative)  sunspot  number  is  a  weighted  average  of  solar  activities  measured  from 
a  network  of  observatories.  Historically,  the  daily  sunspot  number  was  computed  as 
some  weighted  sum  of  the  count  of  visible,  distinct  spots  and  that  of  clusters  of  spots  on 
the  solar  surface.  The  sunspot  number  reflects  the  intensity  of  solar  activity.  Below,  the 
sunspot  data  are  square  root  transformed  to  make  them  more  normally  distributed;  see 
Exhibit  15.4.  The  time  series  plot  shows  that  the  sunspot  series  tends  to  rise  up  more 
quickly  than  when  it  declines,  suggesting  that  it  is  time-irreversible. 


Exhibit  15.4  Annual  American  Relative  Sunspot  Numbers 


>  win . graph (width=4 . 875 , height =2 . 5 , pointsize=8 ) 

>  data (spots) 

>  plot ( sqrt ( spots ) , type= 'o', xlab= ' Year ' , 

ylab= ' Sqrt  Sunspot  Number1) 
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To  carry  out  the  tests  for  nonlinearity,  we  have  to  specify  m,  the  working  autore¬ 
gressive  order.  Under  the  null  hypothesis  that  the  process  is  linear,  the  order  can  be  spec¬ 
ified  by  using  some  information  criterion,  for  example,  the  AIC.  For  the  sunspot  data,  m 
=  5  based  on  the  AIC.  Both  the  Keenan  test  and  the  Tsay  test  reject  linearity,  with 
/;- values  being  0.0002  and  0.0009,  respectively. 

For  the  second  example,  we  consider  the  predator  series  discussed  in  the  preceding 
section.  The  working  AR  order  is  found  to  be  4.  Both  the  Keenan  test  and  the  Tsay  test 
reject  linearity,  with  p-values  being  0.00001  and  0.03,  respectively,  which  is  consistent 
with  the  inference  drawn  from  the  lagged  regression  plots  reported  earlier. 

There  are  some  other  tests,  such  as  the  BDS  test  developed  by  Brock,  Deckert  and 
Seheinkman  (1996),  based  on  concepts  that  arise  in  the  theory  of  chaos,  and  the  neu¬ 
ral-network  test,  proposed  by  White  (1989)  for  testing  “neglected  nonlinearity.”  For  a 
recent  review  of  tests  for  nonlinearity,  see  Tong  (1990)  and  Granger  and  Terasvirta 
(1993).  We  shall  introduce  one  more  test  later. 

15.3  Polynomial  Models  Are  Generally  Explosive 


In  nonlinear  regression  analysis,  polynomial  regression  models  of  higher  degrees  are 
sometimes  employed,  even  though  they  are  deemed  not  useful  for  extrapolation  because 
of  their  quick  blowup  to  infinity.  For  this  reason,  polynomial  regression  models  are  of 
limited  practical  use.  Based  on  the  same  reasoning,  polynomial  time  series  models  may 
be  expected  to  do  poorly  in  prediction.  Indeed,  polynomial  time  series  models  of  degree 
higher  than  1  and  with  Gaussian  errors  are  invariably  explosive.  To  see  this,  consider  the 
following  simple  quadratic  AR(1)  model. 

Yt=^_l+et  (15.3.1) 

where  {et}  are  independent  and  identically  distributed  standard  normal  random  vari¬ 
ables.  Let  <j)  >  0  and  let  c  be  a  large  number  that  is  greater  than  3/<j).  If  Tj  >  c  (which  may 
happen  with  positive  probability  due  to  the  normality  of  the  errors),  then  T2  >  3Tj  +  e2 
and  hence  T2  >  2 c  with  some  nonzero  probability.  With  careful  probability  analysis,  it 
can  be  shown  that,  with  positive  probability,  the  quadratic  AR(1)  process  satisfies  the 
inequality  Yt  >  2'c  for  t  =  1,  2,  3,...  and  hence  blows  up  to  +oo.  Indeed,  the  quadratic 
AR(1)  process,  with  normal  errors,  goes  to  infinity  with  probability  1. 

As  an  example.  Exhibit  15.5  displays  a  realization  from  a  quadratic  AR(1)  model 
with  (j)  =  0.5  and  standard  normal  errors  that  takes  off  to  infinity  at  t  =  15. 

Note  that  the  quadratic  AR(1)  process  becomes  explosive  only  when  the  process 
takes  some  value  of  sufficiently  large  magnitude.  If  the  coefficient  4)  is  small,  it  may 
take  much  longer  for  the  quadratic  AR(1)  process  to  take  off  to  infinity.  Normal  errors 
can  take  arbitrarily  large  values,  although  rather  rarely,  but  when  this  happens,  the  pro¬ 
cess  becomes  explosive.  Thus,  any  noise  distribution  that  is  unbounded  will  guarantee 
the  explosiveness  of  the  quadratic  AR(1)  model.  Chan  and  Tong  (1994)  further  showed 
that  this  explosive  behavior  is  true  for  any  polynomial  autoregressive  process  of  degree 
higher  than  1  and  of  any  finite  order  when  the  noise  distribution  is  unbounded. 
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Exhibit  15.5  A  Simulated  Quadratic  AR(1)  Process  with  ({)  =  0.5 


t 

>  set . seed (1234567) 

>  plot (y=qar . sim (n=15 , phil= . 5 , sigma=l) , x=l : 15 , type= 'o', 

ylab=expression (Y [t] ) , xlab= ' t ' ) 


It  is  interesting  to  note  that,  for  bounded  errors,  a  polynomial  autoregressive  model 
may  admit  a  stationary  distribution  that  could  be  useful  for  modeling  nonlinear  time 
series  data;  see  Chan  and  Tong  (1994).  For  example,  Exhibit  15.6  displays  the  time 
series  solution  of  a  deterministic  logistic  map,  namely  Yt  =  3.97  Yt_  j(l  -Yt_{),t=  2, 
3,. . .  with  the  initial  value  Yl  =  0.377.  Its  corresponding  sample  ACF  is  shown  in  Exhibit 
15.7,  which,  except  for  the  mildly  significant  lag  4,  resembles  that  of  white  noise.  Note 
that,  for  a  sufficiently  large  initial  value,  the  solution  of  the  logistic  map  will  explode  to 
infinity. 


Exhibit  15.6  The  Trajectory  of  the  Logistic  Map  with  Parameter  3.97  and 
Initial  Value  =  0.377 


t 


>  y=qar . sim (n=100 , const =0 . 0 , phi 0=3 . 97 , phil=-3 .97, sigma=0 , 
init= .377) 
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>  plot (x=l : 100 , y=y, type= 1 1 1 , ylab=expression (Y [t] ) , xlab= ' t 1 ) 


Lag 

>  acf (y) 


However,  the  bound  on  the  noise  distribution  necessary  for  the  existence  of  a  sta¬ 
tionary  polynomial  autoregressive  model  varies  with  the  model  parameters  and  the  ini¬ 
tial  value,  which  greatly  complicates  the  modeling  task.  Henceforth,  we  shall  not  pursue 
the  use  of  polynomial  models  in  time  series  analysis. 

15.4  First-Order  Threshold  Autoregressive  Models 


The  discussion  in  the  preceding  section  provides  an  important  insight  that  for  a  nonlin¬ 
ear  time  series  model  to  be  stationary,  it  must  be  either  linear  or  approaching  linearity  in 
the  “tail.”  From  this  perspective,  piecewise  linear  models,  more  widely  known  as 
threshold  models,  constitute  the  simplest  class  of  nonlinear  model.  Indeed,  the  useful¬ 
ness  of  threshold  models  in  nonlinear  time  series  analysis  was  well-documented  by  the 
seminal  work  of  Tong  (1978,  1983,  1990)  and  Tong  and  Lim  (1980),  resulting  in  an 
extensive  literature  of  ongoing  theoretical  innovations  and  applications  in  various  fields. 

The  specification  of  a  threshold  model  requires  specifying  the  number  of  linear 
submodels  and  the  mechanism  dictating  which  of  them  is  operational.  Consequently, 
there  exist  many  variants  of  the  threshold  model.  Here,  we  focus  on  the  two-regime 
self-exciting  threshold  autoregressive  (SETAR)  model  introduced  by  Tong,  for  which 
the  switching  between  the  two  linear  submodels  depends  solely  on  the  position  of  the 
threshold  variable.  For  the  SETAR  model  (simply  referred  to  as  the  TAR  model  below), 
the  threshold  variable  is  a  certain  lagged  value  of  the  process  itself;  hence  the  adjective 
self-exciting.  (More  generally,  the  threshold  variable  may  be  some  vector  covariate  pro¬ 
cess  or  even  some  latent  process,  but  this  extension  will  not  be  pursued  here.)  To  fix 
ideas,  consider  the  following  first-order  TAR  model: 
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Vo  +  Viyf-i  +  CTier  ifYt-i^r 
(t)2,  0  +  ^2,  lYt-  1  +  a2et’  ^Yt-[>r 


(15.4.1) 


where  the  <|>’s  are  autoregressive  parameters,  ct’s  are  noise  standard  deviations,  r  is  the 
threshold  parameter,  and  {et}  is  a  sequence  of  independent  and  identically  distributed 
random  variables  with  zero  mean  and  unit  variance.  Thus,  if  the  lag  1  value  of  Yt  is  not 
greater  than  the  threshold,  the  conditional  distribution  of  Yt  is  the  same  as  that  of  an 
AR(1)  process  with  intercept  4>  i  o>  autoregressive  coefficient  (jq  j,  and  error  variance 
ctj  ,  in  which  case  we  may  say  that  the  first  AR(1)  submodel  is  operational.  On  the  other 
hand,  when  the  lag  1  value  of  Yt  exceeds  the  threshold  r,  the  second  AR(1)  process  with 
parameters  (<j)0  0,  (j)-,  j,  a,)  is  operational.  Thus,  the  process  switches  between  two  lin¬ 
ear  mechanisms  dependent  on  the  position  of  the  lag  1  value  of  the  process.  When  the 
lag  1  value  does  not  exceed  the  threshold,  we  say  that  the  process  is  in  the  lower  (first) 
regime,  and  otherwise  it  is  in  the  upper  regime.  Note  that  the  error  variance  need  not  be 
identical  for  the  two  regimes,  so  that  the  TAR  model  can  account  for  some  conditional 
heteroscedasticity  in  the  data. 

As  a  concrete  example,  we  simulate  some  data  from  the  following  first-order  TAR 
model: 


0 .5Yt_l+et,  if  Tf  j  < - 1 
-l.BYt_l+2ep  if  Yt_i  >  - 1 


(15.4.2) 


Exhibit  15.8  shows  the  time  series  plot  of  the  simulated  data  of  size  n  =  100.  A  notable 
feature  of  the  plot  is  that  the  time  series  is  somewhat  cyclical,  with  asymmetrical  cycles 
where  the  series  tends  to  drop  rather  sharply  but  rises  relatively  slowly.  This  asymmetry 
means  that  the  probabilistic  structure  of  the  process  will  be  different  if  we  reverse  the 
direction  of  time.  One  way  to  see  this  is  to  make  a  transparency  of  the  time  series  plot 
and  flip  the  transparency  over  to  see  the  time  series  plot  with  time  reversed.  In  this  case, 
the  simulated  data  will  rise  sharply  and  drop  slowly  with  time  reversed.  Recall  that  this 
phenomenon  is  known  as  time  irreversibility.  For  a  stationary  Gaussian  ARMA  process, 
the  probabilistic  structure  is  determined  by  its  first  and  second  moments,  which  are 
invariant  with  respect  to  time  reversal,  hence  the  process  must  be  time-reversible.  Many 
real  time  series,  for  example  the  predator  series  and  the  relative  sunspot  series,  appear  to 
be  time-irreversible,  suggesting  that  the  underlying  process  is  nonlinear.  Exhibit  15.9 
shows  the  QQ  normal  score  plot  for  the  simulated  data.  It  shows  that  the  distribution  of 
simulated  data  has  a  thicker  tail  than  a  normal  distribution,  despite  the  fact  that  the 
errors  are  normally  distributed. 
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Exhibit  15.8  A  Simulated  First-Order  TAR  Process 


t 


>  set . seed (1234579) 

>  y=tar . sim (n=100 , Phil=c (0,0.5) ,Phi2=c(0,-1.8) ,p=l,d=l, sigmal=l , 

thd=-l , sigma2=2) $y 

>  plot (y=y, x=l : 100 , type= 'o', xlab= ' t ' , ylab=expression (Y [t] ) ) 


Exhibit  15.9  QQ  Normal  Plot  for  the  Simulated  TAR  Process 


Theoretical  Quantiles 


>  win . graph (width=2 . 5 , height =2 . 5 , pointsize=8 ) 

>  qqnorm(y);  qqline(y) 


The  autoregressive  coefficient  of  the  submodel  in  the  upper  regime  equals  -1.8,  yet 
the  simulated  data  appear  to  be  stationary,  which  may  be  unexpected  from  a  linear  per¬ 
spective,  as  an  AR(  1 )  model  cannot  be  stationary  if  the  autoregressive  coefficient 
exceeds  1  in  magnitude.  This  puzzle  may  be  better  understood  by  considering  the  case 
of  no  noise  terms  in  either  regime;  that  is,  =  0.  The  deterministic  process  thus 

defined  is  referred  to  as  the  skeleton  of  the  TAR  model.  We  show  below  that,  for  any  ini- 
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tial  value,  the  skeleton  is  eventually  a  bounded  process;  the  stability  of  the  skeleton 
underlies  the  stationarity  of  the  TAR  model.  Readers  not  interested  in  the  detailed  anal¬ 
ysis  verifying  the  ultimate  boundedness  of  the  skeleton  may  skip  to  the  next  paragraph. 
Let  the  initial  value  yj  be  some  large  number,  say  10,  a  value  falling  in  the  upper  regime. 
So,  the  next  value  is  y2  =  (— 1.8)x  10  =  -18,  which  is  in  the  lower  regime.  Therefore,  the 
third  value  equals  y2  =  0.5x(-18)  =  -9.  As  the  third  value  is  in  the  lower  regime,  the 
fourth  value  equals  V4  =  0.5  x  (-9)  =  -4.5,  which  remains  in  the  lower  regime,  so  that  the 
fifth  value  equals  y5  =  0.5  x  (-4.5)  =  -2.25.  It  is  clear  that  once  the  data  remain  in  the 
lower  regime,  they  will  be  halved  in  the  next  iterate  and  this  process  continues  until 
some  future  iterate  crosses  the  threshold  -1,  which  occurs  for  y7  =  -0.5625.  Now  the 
second  linear  submodel  is  operational,  so  that  y§  =  (- 1. 8)  x  (-0.5625)  =  1.0125  and  V9  = 
(-1.8)x  1.0125  =  -1.8225,  which  is  again  in  the  lower  regime.  In  conclusion,  if  some 
iterate  is  in  the  lower  regime,  the  next  iterate  is  obtained  by  halving  the  previous  iterate 
until  some  future  iterate  exceeds  -1.  On  the  other  hand,  if  some  iterate  exceeds  1,  the 
next  iterate  must  be  less  than  -1  and  hence  in  the  lower  regime.  By  routine  analysis,  it 
can  be  checked  that  the  process  is  eventually  trapped  between  -1  and  1.8  and  hence  is  a 
bounded  process. 

A  bounded  skeleton  is  stable  in  some  sense.  Chan  and  Tong  (1985),  showed  that 
under  some  mild  conditions,  a  TAR  model  is  asymptotically  stationary  if  its  skeleton  is 
stable.  In  fact,  stability  of  the  skeleton  together  with  some  regularity  conditions  imply 
the  stronger  property  of  ergodicity,  namely,  the  process  admits  a  stationary  distribution 
and  for  any  function  h(  Yt)  having  a  finite  stationary  first  moment  (which  holds  if  h  is  a 
bounded  function), 

Z  t  W  (15-4-3) 

/=i 

converges  to  the  stationary  mean  of  h{Yt),  computed  according  to  the  stationary  distribu¬ 
tion.  See  Cline  and  Pu  (2001)  for  a  recent  survey  on  the  linkage  between  stability  and 
ergodicity  and  counterexamples  when  this  linkage  may  fail  to  hold. 

The  stability  analysis  of  the  skeleton  can  be  much  simplified  by  the  fact  that  the 
ergodicity  of  a  TAR  model  can  be  inferred  from  the  stability  of  an  associated  skeleton 
defined  by  a  difference  equation  obtained  by  modifying  the  equation  defining  the  TAR 
model  by  suppressing  the  noise  terms  and  the  intercepts  (that  is,  zero  errors  and  zero 
intercepts)  and  setting  the  threshold  to  0.  For  the  simulated  example,  the  associated  skel¬ 
eton  is  then  defined  by  the  following  difference  equation: 


if  Tf  _  j  <  0 
-1 ,8T;_  j,  if  Yt  ]>  0 


(15.4.4) 


Now,  the  solution  to  the  skeleton  above  can  be  readily  obtained:  Given  a  positive  value 
for;}’],  yt  =  (-1.8)x0.5f_2xy1,  for  all  t  >  2.  For  negative  y1;  yt  =  0.5f-1xyj.  In  both  cases, 
yt  — >  0,  as  t  — »  00.  The  origin  is  said  to  be  an  equilibrium  point  as  yt  =  0,  for  all  t,  if  yj  = 
0.  The  origin  is  then  said  to  be  a  globally  exponentially  stable  limit  point,  as  the  skeleton 
approaches  it  exponentially  fast  for  any  nonzero  initial  value.  It  can  be  shown  (Chan  and 


15.5  Threshold  Models 


399 


Tong,  1985)  that  the  origin  is  a  globally  exponentially  stable  limit  point  for  the  skeleton 
if  the  parameters  satisfy  the  constraints 

<t>i,  i  <  !,  <t>2,  i  <  1,  <t>i,  1^2,  i  <  1  (15.4.5) 

in  which  case  the  first-order  TAR  model  is  ergodic  and  hence  stationary.  Exhibit  15.10 
shows  the  region  of  stationarity  shaded  in  gray.  Note  that  the  region  of  stationarity  is 
substantially  larger  than  the  region  defined  by  the  linear  time  series  inspired  constraints 
14*1  ll  <  I?  l(t>2  ll  <  1»  corresponding  to  the  region  bounded  by  the  inner  square  in  Exhibit 
15.10.  For  parameters  lying  strictly  outside  the  region  defined  by  the  constraints  (Equa¬ 
tions  (15.4.5)),  the  skeleton  is  unstable  and  the  TAR  model  is  nonstationary.  For  exam¬ 
ple,  if  (j>2  j>l,  then  the  skeleton  will  escape  to  positive  infinity  for  all  sufficiently  large 
initial  values.  On  the  boundary  of  the  parametric  region  defined  by  (15.4.5),  the  inter¬ 
cept  terms  of  the  TAR  model  are  pivotal  in  determining  the  stability  of  the  skeleton  and 
the  stationarity  of  the  TAR  models;  see  Chan  et  al.  (1985).  In  practice,  we  can  check  if 
the  skeleton  is  stable  numerically  by  using  several  different  initial  values.  A  stable  skel¬ 
eton  gives  us  more  confidence  in  assuming  that  the  model  is  stationary. 


Exhibit  15.10  Stationarity  Region  for  the  First-Order  TAR  Model  (Shaded) 


-3-2-10  1  2  3 

<l>i,i 


15.5  Threshold  Models 


The  first-order  (self-exciting)  threshold  autoregressive  model  can  be  readily  extended  to 
higher  order  and  with  a  general  integer  delay: 
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4>i,o  +  'buiYt-i  +  -  +  ,bi,p1Yt-pl  +  aiet’  iiY,-d^r 
ho  +  hiYt-t+'"+hp2Yt-p2  +  a2er  lfYt-d>r 


(15.5.1) 


Note  that  the  autoregressive  orders  p-^  and  p7  of  the  two  submodels  need  not  be  identi¬ 
cal,  and  the  delay  parameter  d  may  be  larger  than  the  maximum  autoregressive  orders. 
However,  by  including  zero  coefficients  if  necessary,  we  may  and  shall  henceforth 
assume  that  p  j  =  =  p  and  1  <  d  <  p,  which  simplifies  the  notation.  The  TAR  model 

defined  by  Equation  (15.5.1)  is  denoted  as  the  'I’AR(2;/q./io)  model  with  delay  d. 

Again,  the  stability  of  the  associated  skeleton,  obtained  by  setting  the  threshold  to 
zero  and  suppressing  the  noise  terms  and  the  intercepts,  implies  that  the  TAR  model  is 
ergodic  and  stationary.  However,  the  stability  of  the  associated  skeleton  is  now  much 
more  complex  in  the  higher-order  case  so  much  so  that  the  necessary  and  sufficient 
parametric  conditions  for  the  stationarity  of  the  TAR  model  are  still  unknown.  Nonethe¬ 
less,  there  exist  some  simple  sufficient  conditions  for  the  stationarity  of  a  TAR  model. 
For  example,  the  TAR  model  is  ergodic  and  hence  asymptotically  stationary  if  |4>j  i| 
H - *"  |<h  p\  <  1  and  |4»2  il  H - t- 14>2  p\  <  1;  see  Chan  and  Tong  (1985). 

So  far,  we  have  considered  the  case  of  two  regimes  defined  by  the  partition  -oo  <  r  < 
oo  of  the  real  line,  so  that  the  first  (second)  submodel  is  operational  if  Yt  _  j  lies  in  the 
first  (second)  interval.  The  extension  to  the  case  of  m  regimes  is  straightforward  and 
effected  by  partitioning  the  real  line  into  -oo  <  iq  <  r2  <•  •  •<  rm  _  j  <  oo,  and  the  position 
of  Yt_ci  relative  to  these  thresholds  determines  which  linear  submodel  is  operational. 
We  shall  not  pursue  this  topic  further  but  shall  restrict  our  discussion  to  the  case  of  two 
regimes. 


15.6  Testing  for  Threshold  Nonlinearity 


While  Keenan’s  test  and  Tsay’s  test  for  nonlinearity  are  designed  for  detecting  quadratic 
nonlinearity,  they  may  not  be  sensitive  to  threshold  nonlinearity.  Here,  we  discuss  a  like¬ 
lihood  ratio  test  with  the  threshold  model  as  the  specific  alternative.  The  null  hypothesis 
is  an  AR(p)  model  versus  the  alternative  hypothesis  of  a  two-regime  TAR  model  of 
order  p  and  with  constant  noise  variance,  that  is;  =  a2  ~  c-  With  these  assumptions, 
the  general  model  can  be  rewritten  as 

Yt  =  ll,l,0  +  (t,l,lFf-l  +  +$l,pYt-p 

(15.6.1) 

+  {<t>2j0  + ‘Ih  \Yt-\  +  +  $2,pYt-p}I(Yt-d>  r)  +  aet 

where  the  notation  /(•)  is  an  indicator  variable  that  equals  1  if  and  only  if  the  enclosed 
expression  is  true.  Moreover,  in  this  formulation,  the  coefficient  <|> 2  0  represents  the 
change  in  the  intercept  in  the  upper  regime  relative  to  that  of  the  lower  regime,  and  sim¬ 
ilarly  interpreted  are  (J)2  The  null  hypothesis  states  that  (jb  o  =  <b  i  =••  •=  ^  p  = 

0.  While  the  delay  may  be  theoretically  larger  than  the  autoregressive  order,  this  is  sel¬ 
dom  the  case  in  practice.  Hence,  it  is  assumed  that  d  <  p  throughout  this  section,  and 
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under  this  assumption  and  assuming  the  validity  of  linearity,  the  large-sample  distribu¬ 
tion  of  the  test  does  not  depend  on  d. 

In  practice,  the  test  is  carried  out  with  fixed  p  and  d.  The  likelihood  ratio  test  statis¬ 
tic  can  be  shown  to  be  equivalent  to 


T„  =  (n-p)  log 


&2(#0) 


(15.6.2) 


where  n-p  is  the  effective  sample  size,  a2(//Q)  is  the  maximum  likelihood  estimator 
of  the  noise  variance  from  the  linear  AR (p)  fit  and  cr(H j)  from  the  TAR  fit  with  the 
threshold  searched  over  some  finite  interval.  See  the  next  section  for  a  detailed  discus¬ 
sion  on  estimating  a  TAR  model.  Under  the  null  hypothesis  that  <t>2  o  =  §2  1  =  ” ' =  4*2  p  = 
0,  the  (nuisance)  parameter  r  is  absent.  Hence,  the  sampling  distribution  of  the  likeli- 
hood  ratio  test  under  H0  is  no  longer  approximately  /  with  p  degrees  of  freedom. 
Instead,  it  has  a  nonstandard  sampling  distribution;  see  Chan  (1991)  and  Tong  (1990). 
Chan  (1991)  derived  an  approximation  method  for  computing  the  /^-values  of  the  test 
that  is  highly  accurate  for  small  p- values.  The  test  depends  on  the  interval  over  which 
the  threshold  parameter  is  searched.  Typically,  the  interval  is  defined  to  be  from  the 
ax  100th  percentile  to  the  bx  100th  percentile  of  |  Y, } ,  say  from  the  25th  percentile  to  the 
75th  percentile.  The  choice  of  a  and  b  must  ensure  that  there  are  adequate  data  falling 
into  each  of  the  two  regimes  for  fitting  the  linear  submodels. 

The  reader  may  wonder  why  the  search  of  the  threshold  is  restricted  to  some  finite 
interval.  Intuitively,  such  a  restriction  is  desirable,  as  we  want  enough  data  to  estimate 
the  parameters  for  the  two  regimes  under  the  alternative  hypothesis.  A  deeper  reason  is 
mathematical  in  nature.  This  restriction  is  necessary  because  if  the  true  model  is  linear, 
the  threshold  parameter  is  undefined,  in  which  case  an  unrestricted  search  may  result  in 
the  threshold  estimator  being  close  to  the  minimum  or  maximum  data  values,  making 
the  large-sample  approximation  ineffective. 

We  illustrate  the  likelihood  ratio  test  for  threshold  nonlinearity  using  the  (square- 
root-transformed)  relative  sunspot  data  and  the  (log-transformed)  predator  data.  Recall 
that  both  Keenan’s  test  and  Tsay’s  test  suggested  that  these  data  are  nonlinear.  Setting  p 
=  5,  a  =  0.25,  and  b  =  0.75  for  the  sunspot  data,  we  tried  the  likelihood  ratio  test  for 
threshold  nonlinearity  with  different  delays  from  1  to  5,  resulting  in  the  test  statistics 
being  46.9,  111.3,99.1,85.0,  and  45.1,  respectively. 1  Repeating  the  test  with  a  =  0. 1  and 
b  =  0.9  yields  identical  results  for  this  case.  All  the  tests  above  have  /j- values  less  than 
0.000,  suggesting  that  the  data-generating  mechanism  is  highly  nonlinear.  Notice  that 
the  test  statistic  attains  the  largest  value  when  d  =  2;  hence  we  may  tentatively  estimate 


t  The  R  code  to  carry  out  these  calculations  is  as  follows: 

>  pvaluem=NULL 

>  for  (d  in  1:5)  {  res=tlrt (sqrt (spots) , p=5 , d=d, a=0 . 25 , b=0 . 75) 

>  pvaluem=  cbind(  pvaluem,  c (d, res$test . statistic, res$p .value) )  } 

>  rownames (pvaluem) =c ( 1 d test  statistic  1 p-value ' ) 

>  round (pvaluem, 3 ) 
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the  delay  to  be  2.  But  delay  3  is  very  competitive. 

Next,  consider  the  predator  series,  with  p  =  4,  a  =  0.25,  b  -  0.75,  and  1  <  d  <  4.  The 
test  statistics  and  their  73-values,  enclosed  in  parentheses,  are  found  to  equal  19.3 
(0.026),  28.0  (0.001),  32.0  (0.000),  and  16.2  (0.073),  respectively.  Thus,  there  is  some 
evidence  that  the  predator  series  is  nonlinear,  with  the  delay  likely  to  be  2  or  3.  Note  that 
the  test  is  not  significant  for  d  =  4  at  the  5%  significance  level. 1 

15.7  Estimation  of  a  TAR  Model 


Because  the  stationary  distribution  of  a  TAR  model  does  not  have  a  closed-form  solu¬ 
tion,  estimation  is  often  carried  out  conditional  on  the  nvdx(p.d)  initial  values,  where  p  is 
the  order  of  the  process  and  d  the  delay  parameter.  Moreover,  the  noise  series  is  often 
assumed  to  be  normally  distributed,  and  we  will  make  this  assumption  throughout  this 
section.  The  normal  error  assumption  implies  that  the  response  is  conditionally  normal, 
but  see  Samia,  Chan  and  Stenseth  (2007)  for  some  recent  work  on  the  nonnormal  case. 
If  the  threshold  parameter  r  and  the  delay  parameter  d  are  known,  then  the  data  cases 
can  be  split  into  two  parts  according  to  whether  or  not  Yt-d  -  r ■  Let  there  be  /?  j  data 
cases  in  the  lower  regime.  With  the  data  in  the  lower  regime,  we  can  regress  Yt  on  its 
lags  1  to  p  to  find  the  estimates  of  (fq  Q,  <|)  1  j,  ...,  (jq  and  the  maximum  likelihood 
noise  variance  estimate  a2 ;  that  is,  the  sum  of  squared  residuals  divided  by  n  | .  The 
number  rq  and  the  parameter  estimates  for  the  lower  regime  generally  depend  on  r  and 
d\  we  sometimes  write  the  more  explicit  notation,  for  example  nfr.d),  below  for  clarity. 
Similarly,  using  the  data,  say  n2  of  them,  falling  in  the  upper  regime,  we  can  obtain  the 
parameter  estimates  <))2  q,  <))2  j,  ...,  <)>-,  and  o2 .  Clearly,  jq  +  n2  =  n  -  p,  where  n  is 
the  sample  size.  Substituting  these  estimates  into  the  log-likelihood  function  yields  the 
so-called  profile  log-likelihood  function  of  (r,d): 


l(r,  d) 


1  +  log(2n)}  -  — -logdojlr,  d))2) 


2 

n2(r,  d) 


log((CT2(r,  d))2) 


(15.7.1) 


The  estimates  of  r  and  d  can  be  obtained  by  maximizing  the  profile  likelihood  func¬ 
tion  above.  The  optimization  need  only  be  searched  with  r  over  the  observed  K’s  and 
integer  d  between  1  and  p.  This  is  because,  for  fixed  d,  the  function  above  is  constant 
between  two  consecutive  observations. 

However,  without  some  restrictions  on  the  threshold  parameter,  the  (conditional) 
maximum  likelihood  method  discussed  above  will  not  work.  For  example,  if  the  lower 
regime  contains  only  one  data  case,  the  noise  variance  CT|  =  0  so  that  the  conditional 
log-likelihood  function  equals  00,  in  which  case  the  conditional  maximum  likelihood 
estimator  is  clearly  inconsistent.  This  problem  may  be  circumvented  by  restricting  the 


f  The  R  code  for  this  calculation  is  similar  to  that  shown  on  the  previous  page.  The  details 
may  be  found  in  the  R  code  scripts  for  Chapter  15  available  on  the  textbook  Website. 
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search  of  the  threshold  to  be  between  two  predetermined  percentiles  of  Y;  for  example, 
between  the  tenth  and  ninetieth  percentiles. 

Another  approach  to  handle  the  aforementioned  difficulty  is  to  estimate  the  param¬ 
eters  using  the  conditional  least  squares  (CLS)  approach.  The  CLS  approach  estimates 
the  parameters  by  minimizing  the  predictive  sum  of  squared  errors,  or  equivalently  con¬ 
ditional  maximum  likelihood  estimation  for  the  case  of  homoscedastic  (constant-vari¬ 
ance)  Gaussian  errors;  that  is,  CTj  =  CT2  =  ct  so  that  maximizing  the  log-likelihood 
function  is  equivalent  to  minimizing  the  conditional  residual  sum  of  squares: 

L(r,d)=  £  {(V4>i,0-4>i,irf-i - )2/(Ff_rf<r) 

t  =  p+ 1  (15.7.2) 

+  W,-ho-h,iY,-i - hpYt-p^Y,-d>r)} 

where  I(Yt_j  <  r)  equals  1  if  Yt_^<  r  and  0  otherwise;  the  expression  /( Y,  (j  >  r )  is 
similarly  defined.  Again,  the  optimization  need  only  be  done  with  r  searched  over  the 
observed  K’s  and  d  an  integer  between  1  and  p.  The  conditional  least  squares  approach 
has  the  advantage  that  the  threshold  parameter  can  be  searched  without  any  constraints. 
Under  mild  conditions,  including  stationarity  and  that  the  true  conditional  mean  func¬ 
tion  is  a  discontinuous  function,  Chan  (1993)  showed  that  the  CLS  method  is  consistent; 
that  is,  the  estimator  approaches  the  true  value  with  increasing  sample  size.  As  the  delay 
is  an  integer,  the  consistency  property  implies  that  the  delay  estimator  is  eventually 
equal  to  the  true  value  with  very  large  sample  size.  Furthermore,  the  sampling  error  of 
the  threshold  estimator  is  of  the  order  l/«,  whereas  the  sampling  error  of  the  other 
parameters  is  of  order  1  /  Jn  .  The  faster  convergence  of  the  threshold  parameter  and  the 
delay  parameter  to  their  true  values  implies  that  in  assessing  the  uncertainty  of  the 
autoregressive  parameter  estimates,  the  threshold  and  the  delay  may  be  treated  as  if  they 
were  known.  Consequently,  the  autoregressive  parameter  estimators  from  the  two 
regimes  are  approximately  independent  of  each  other,  and  their  sampling  distributions 
are  approximately  the  same  as  those  from  the  ordinary  least  squares  regression  with  data 
from  the  corresponding  true  regimes.  These  large-sample  distribution  results  can  be 
lifted  to  the  case  of  the  conditional  maximum  likelihood  estimator  provided  the  true 
parameter  satisfies  the  regularity  conditions  alluded  to  before.  Finally,  we  note  that  the 
preceding  large-sample  properties  of  the  estimator  are  radically  different  if  the  true  con¬ 
ditional  mean  function  is  continuous;  see  Chan  and  Tsay  (1998). 

In  practice,  the  AR  orders  in  the  two  regimes  need  not  be  identical  or  known.  Thus, 
an  efficient  estimation  procedure  that  also  estimates  the  orders  is  essential.  Recall  that 
for  linear  ARMA  models,  the  AR  orders  can  be  estimated  by  minimizing  the  AIC.  For 
fixed  r  and  d,  the  TAR  model  is  essentially  fitting  two  AR  models  of  orders  p\  and  p2, 
respectively,  so  that  the  AIC  becomes 

AKT/Jj,  p2,  r,d )  =  -  2 l(r,  d)  +  2 (pi  +  p2  +  2)  (15.7.3) 

where  the  number  of  parameters,  excluding  r,  d,  Oj,  and  a2,  equals  p j  +  p2  +  2.  Now,  the 
minimum  AIC  (MAIC)  estimation  method  estimates  the  parameters  by  minimizing  the 
AIC  subject  to  the  constraint  that  the  threshold  parameter  be  searched  over  some  inter- 
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val  that  guarantees  any  regimes  have  adequate  data  for  estimation.  Adding  2  to  the  min¬ 
imum  AIC  so  found  is  defined  as  the  nominal  AIC  of  the  estimated  threshold  model, 
based  on  the  naive  idea  of  counting  the  threshold  parameter  as  one  additional  parameter. 
Since  the  threshold  parameter  generally  adds  much  flexibility  to  the  model,  it  is  likely  to 
add  more  than  one  degree  of  freedom  to  the  model.  An  asymptotic  argument  suggests 
that  it  may  be  equivalent  to  adding  three  degrees  of  freedom  to  the  model;  see  Tong 
(1990,  p.  248). 

We  illustrate  the  estimation  methods  with  the  predator  series.  In  the  estimation,  the 
maximum  order  is  set  to  be  p  =  4  and  1  <  d  <  4.  This  maximum  order  is  the  AR  order 
determined  by  AIC,  which  is  likely  to  be  not  smaller  than  the  order  of  the  true  TAR 
model.  Alternatively,  the  order  may  be  determined  by  cross-validation,  which  is  com¬ 
puter-intensive;  see  Cheng  and  Tong  (1992).  Using  the  MAIC  method  with  the  search  of 
threshold  roughly  between  the  tenth  and  ninetieth  percentiles,  the  table  in  Exhibit  15.1 1 
displays  the  nominal  AIC  value  of  the  estimated  TAR  model  for  1  <  d  <  4.  The  nominal 
AIC  is  smallest  when  cl  =  3,  so  we  estimate  the  delay  to  be  3.  The  table  in  Exhibit  15.12 
summarizes  the  corresponding  model  fit. 


Exhibit  15.1 1  Nominal  AIC  of  the  TAR  Models  Fitted  to  the  Log(predator) 
Series  for  1  <  d  <  4 


d 

AIC 

A 

r 

A 

Pi 

A 

P2 

1 

19.04 

4.15 

2 

1 

2 

12.15 

4.048 

1 

4 

3 

10.92 

4.661 

1 

4 

4 

18.42 

5.096 

3 

4 

>  AICM=NULL 

>  for (d  in  1:4) 

{predator .tar=tar(y=log (predator .eq) ,pl=4,p2=4,d=d,a=.l,b=.9) 

>  AICM=rbind (AICM, 

c (d, predator . tar$AIC , signif (predator . tar$thd, 4 ) , 
predator . tar$pl , predator . tar$p2 ) ) } 

>  colnames (AICM) =c ( ' d ' , ' nominal  AIC 1 , 1 r 1 , ' pi ' , ' p2 ' ) 

>  rownames (AICM) =NULL 

>  AICM 


Although  the  maximum  autoregressive  order  is  4,  the  MAIC  method  selects  order  1 
for  the  lower  regime  and  order  4  for  the  upper  regime.  The  submodel  in  each  regime  is 
estimated  by  ordinary  least  squares  (OLS)  using  the  data  falling  in  that  regime.  Hence  a 
less  biased  estimator  of  the  noise  variance  may  be  estimated  by  the  within-regime  resid¬ 
ual  sum  of  squared  errors  normalized  by  the  effective  sample  size  which  equals  the 
number  of  data  in  that  regime  minus  the  number  of  autoregressive  parameters  (including 
the  intercept)  of  the  corresponding  submodel.  The  “unbiased”  noise  variance  aj  of  the 
/th  regime  relates  to  its  maximum  likelihood  counterpart  by  the  formula 
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a2 

TG‘-’ 


(15.7.4) 


2  2 

where  pt  is  the  autoregressive  order  of  the  ;th  submodel.  Moreover,  (n ,  -  pt  -  I )  ai  / ct/ 
is  approximately  distributed  as  x2  with  «,  -  /;,■  -  1  degrees  of  freedom.  For  each  regime, 
the  /-statistics  and  corresponding  p- values  reported  in  Exhibit  15.12  are  identical  with 
the  computer  output  for  the  case  of  fitting  an  autoregressive  model  with  the  data  falling 
in  that  regime.  Notice  that  the  coefficients  of  lags  2  and  3  in  the  upper  regime  are  not 
significant,  while  that  of  lag  4  is  mildly  significant  at  the  5%  significance  level.  Hence, 
the  model  for  the  upper  regime  may  be  approximated  by  a  first-order  autoregressive 
model.  We  shall  return  to  this  point  later. 


Exhibit  15.12  Fitted  TAR(2;1,4)  Model  for  the  Predator  Data:  MAIC  Method 


A 

d 

A 

r 

Estimate 

3 

4.661 

Std.  Error 

f-statistic 

p-value 

Lower  Regime  (n^ 

=  30) 

$t,o 

0.262 

0.316 

0.831 

0.41 

$i,t 

1.02 

0.0704 

14.4 

0.00 

0.0548 

Upper  Regime  (n2 

=  23) 

(t)2,  0 

4.20 

1.28 

3.27 

0.00 

$2,1 

0.708 

0.202 

3.50 

0.00 

^2,2 

-0.301 

0.312 

-0.965 

0.35 

^2,3 

0.279 

0.406 

0.686 

0.50 

$2,4 

-0.611 

0.273 

-2.24 

0.04 

0.0560 

>  predator . tar . l=tar (y=log (predator . eq) , pl=4 , p2=4 , d=3 , a= . 1 , b= . 9 , 

print=T) 

>  tar (y=log (predator . eq) , pl=l , p2=4 , d=3 , a= . 1 , b= . 9 , print=T, 

method= 1 CLS 1 )  #  re-do  the  estimation  using  the  CLS  method 

>  tar (y=log (predator . eq) , pl=4 , p2=4 , d=3 , a= . 1 , b= . 9 , print=T, 

method= 1 CLS 1 )  #  the  CLS  method  does  not  estimate  the  AR  orders 


The  threshold  estimate  is  4.661,  roughly  the  57th  percentile.  In  general,  a  threshold 
estimate  that  is  too  close  to  the  minimum  or  the  maximum  observation  may  be  unreli¬ 
able  due  to  small  sample  size  in  one  of  the  regimes,  which,  fortunately,  is  not  the  case 
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here.  Exhibit  15.12  does  not  report  the  standard  error  of  the  threshold  estimate  because 
its  sampling  distribution  is  nonstandard  and  rather  complex.  Similarly,  the  discreteness 
of  the  delay  estimator  renders  its  standard  error  useless.  However,  a  parametric  boot¬ 
strap  may  be  employed  to  draw  inferences  on  the  threshold  and  the  delay  parameters. 
An  alternative  is  to  adopt  the  Bayesian  approach  of  Geweke  and  Terui  (1993).  In  con¬ 
trast,  the  fitted  AR(4)  model  has  the  coefficient  estimates  of  lags  1  to  4  equal  to  0.943 
(0.136),  -0.171  (0.188),  -0.1621  (0.186),  and  -0.238  (0.136),  respectively,  with  their 
standard  errors  enclosed  in  parentheses;  the  noise  variance  is  estimated  to  be  0.0852, 
which  is  substantially  larger  than  the  noise  variances  of  the  TAR(2;1,4)  model.  Notice 
that  the  AR(4)  coefficient  estimate  is  close  to  being  nonsignificant,  and  the  AR(2)  and 
AR(3)  coefficient  estimates  are  not  significant. 

An  interesting  question  concerns  the  interpretation  of  the  two  regimes.  One  way  to 
explore  the  nature  of  the  regimes  is  to  identify  which  data  value  falls  in  which  regime  in 
the  time  series  plot  of  the  observed  process.  In  the  time  series  plot  in  Exhibit  15.2  on 
page  387,  data  falling  in  the  lower  regime  (that  is,  those  whose  lag  3  values  are  less  than 
4.661)  are  drawn  as  solid  circles,  whereas  those  in  the  upper  regime  are  displayed  as 
open  circles.  The  plot  reveals  that  the  estimated  lower  regime  corresponds  to  the 
increasing  phase  of  the  predator  cycles  and  the  upper  regime  corresponds  to  the  decreas¬ 
ing  phase  of  the  predator  cycles.  A  biological  interpretation  is  the  following.  When  the 
predator  number  was  low  one  and  a  half  days  earlier,  the  prey  species  would  have  been 
able  to  increase  in  the  intervening  period  so  that  the  predator  species  would  begin  to 
thrive.  On  the  other  hand,  when  the  predator  numbered  more  than  106  «  exp(4.661)  one 
and  a  half  days  earlier,  the  prey  species  crashed  in  the  intervening  period  so  that  the 
predator  species  would  begin  to  crash.  The  increasing  phase  (lower  regime)  of  the  pred¬ 
ator  population  tends  to  be  associated  with  a  robust  growth  of  the  prey  series  that  may 
be  less  affected  by  other  environmental  conditions.  On  the  other  hand,  during  the 
decreasing  phase  (upper  regime),  the  predator  species  would  be  more  susceptible  to 
environmental  conditions,  as  they  were  already  weakened  by  having  less  food  around. 
This  may  explain  why  the  lower  regime  has  a  slightly  smaller  noise  variance  than  the 
upper  regime;  hence  the  slight  conditional  heteroscedasticity.  The  difference  of  the 
noise  variance  in  the  two  regimes  is  unlikely  to  be  significant,  although  the  conditional 
heteroscedasticity  is  more  apparent  in  the  TAR(2;1,1)  model  to  be  discussed  below.  In 
general,  the  regimes  defined  by  the  relative  position  of  the  lag  d  values  of  the  response 
are  proxies  for  some  underlying  latent  process  that  effects  the  switching  between  the  lin¬ 
ear  submodels.  With  more  substantive  knowledge  of  the  switching  mechanism,  the 
threshold  mechanism  may,  however,  be  explicitly  modeled. 

While  the  interpretation  of  the  regimes  above  is  based  on  the  time  series  plot,  it  may 
be  confirmed  by  examining  the  fitted  submodels.  The  fitted  model  of  the  lower  regime 
implies  that  on  the  logarithmic  scale 

Y t  =  0.262+  1.02y,_j  +  0.234ef  (15.7.5) 

The  lag  1  coefficient  is  essentially  equal  to  1  and  suggests  that  the  predator  species 
had  a  (median)  growth  rate  of  (exp(0.262)  -  1)100%  «  30%  every  half  day,  although  the 
intercept  is  not  significant  at  the  5%  level.  This  submodel  is  explosive  because  Y,  — >  oo 
as  t  — >  oo  if  left  unchecked. 


15.7  Estimation  of  a  TAR  Model 


407 


Interpretation  of  the  fitted  model  of  the  upper  regime  is  less  straightforward 
because  it  is  an  order  4  model.  However,  it  was  suggested  earlier  that  it  may  be  approxi¬ 
mated  by  an  AR(1)  model.  Taking  up  this  suggestion,  we  reestimated  the  TAR  model 
with  the  maximum  order  being  1  for  both  regimes.'  The  threshold  estimate  is 
unchanged.  The  lower  regime  gains  one  data  case,  with  less  of  an  initial  data  require¬ 
ment,  but  the  autoregressive  coefficients  are  almost  unchanged.  The  fitted  model  of  the 
upper  regime  becomes 

Y t  =  0.517  +  0.807Yf  ,  +  0.989ef  (15.7.6) 

which  is  a  stationary  submodel.  The  growth  rate  on  the  logarithmic  scale  equals 

Yt-Yt_l  =  0. 517-0. 1937^! +0.989e,  (15.7.7) 

which  has  a  negative  median  since  Yt_  j  >  4.661  on  the  upper  regime.  Notice  that  the 
conditional  heteroscedasticity  is  more  apparent  now  than  the  fitted  TAR(2;1,4)  model. 
The  (nominal)  AIC  of  the  TAR(2;1,1)  model  with  d=  3  equals  14.78,  which  is,  however, 
not  directly  comparable  with  10.92  of  the  TAR(2;1,4)  model  because  of  the  difference 
in  sample  size.  Models  with  different  sample  sizes  may  be  compared  by  their  nominal 
AIC  per  observation.  In  this  case,  the  normalized  AIC  increases  from  0.206  =  10.92/53 
to  0.274  =  14.78/54  when  the  order  is  decreased  from  4  to  1,  suggesting  that  the 
TAR(2;1,4)  model  is  preferable  to  the  TAR(2;1,1)  model. 

Another  way  to  assess  a  nonlinear  model  is  to  examine  the  long-term  (asymptotic) 
behavior  of  its  skeleton.  Recall  that  the  skeleton  of  a  model  is  obtained  by  suppressing 
the  noise  term  from  the  model;  that  is,  replacing  the  noise  term  by  0.  The  skeleton  may 
diverge  to  infinity,  or  it  may  converge  to  a  limit  point,  a  limit  cycle,  or  a  strange  attrac¬ 
tor;  see  Chan  and  Tong  (2001)  for  definitions  and  further  discussion.  The  skeleton  of  a 
stationary  ARMA  model  always  converges  to  some  limit  point.  On  the  other  hand,  the 
skeleton  of  a  stationary  nonlinear  model  may  display  the  full  complexity  of  dynamics 
alluded  to  earlier.  The  skeleton  of  the  fitted  TAR(2;1,4)  model  appears  to  converge  to  a 
limit  cycle  of  period  10,  as  shown  in  Exhibit  15.13.  The  limit  cycle  is  symmetric  in  the 
sense  that  its  increase  phase  and  decrease  phase  are  of  the  same  length.  The  apparent 
long-run  stability  of  the  skeleton  suggests  that  the  fitted  TAR(2;1,4)  model  with  d  =  3  is 
stationary.  In  general,  with  the  noise  term  in  the  model,  the  dynamic  behavior  of  the 
model  may  be  studied  by  simulating  some  series  from  the  stochastic  model.  Exhibit 
15.14  shows  a  typical  realization  from  the  fitted  TAR(2;1,4)  model. 


t  predator . tar . 2=tar (log (predator . eq) , pl=l , p2=l , d=3 , a= . 1 , 
b=.9,  print=T) 
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Exhibit  15.13  Skeleton  of  the  TAR(2;1,4)  Model  for  the  Predator  Series 


t 

>  tar . skeleton (predator . tar . 1 ) 


Exhibit  15.14  Simulated  TAR(2;1,4)  Series 


t 

>  set . seed (356813 ) 

>  plot (y= tar . sim (n=57 , obj ect=predator . tar . 1 ) $y , x=l : 57 , 

ylab=expression (Y [t] ) , xlab=expression (t) , type= ’o') 


The  limit  cycle  of  the  skeleton  of  the  fitted  TAR(2;1,1)  model  with  d  =  3  is  asym¬ 
metric,  with  the  increase  phase  of  length  5  and  the  decrease  phase  of  length  4;  see 
Exhibit  15.15.  A  realization  of  the  fitted  TAR(2;1,1)  model  is  shown  in  Exhibit  15.16. 
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Exhibit  15.15  Skeleton  of  the  First-Order  TAR  Model  for  the  Predator 
Series 


t 

>  predator . tar . 2=tar (log (predator . eq) , pl=l , p2=l , d=3 , a= . 1 , b= . 9 , 

print=T) 

>  tar . skeleton (predator . tar . 2 ) 


Exhibit  15.16  Simulation  of  the  Fitted  TAR(2;1,1)  Model 


t 

>  set . seed (356813 ) 

>  plot (y=tar . sim (n=57 , obj ect=predator . tar .2)  $y , x=l : 57 , 

ylab=expression (Y [t] ) , xlab=expression (t ) , type= 'o') 


For  the  predator  data,  excluding  the  two  initial  transient  cycles  and  the  last  incom¬ 
plete  cycle,  the  table  in  Exhibit  15.17  lists  the  length  of  the  successive  increasing  and 
decreasing  phases.  Observe  that  the  mean  length  of  the  increasing  phases  is  5.4  and  that 
of  the  decreasing  phases  is  4.6. 
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Exhibit  15.17  Length  of  the  Increasing  and  Decreasing  Phases  of  the 
Predator  Series 


Phase 

Increasing  Decreasing 


6 

7 

5 

4 

5 


4 

5 

4 

5 
5 


There  is  some  evidence  of  asymmetry  with  a  longer  increase  phase  than  the 
decrease  phase.  Based  on  the  cycle  length  analysis,  the  TAR(2;1,1)  model  appears  to 
pick  up  the  asymmetric  cycle  property  better  than  the  TAR(2;1,4)  model,  but  the  latter 
model  gets  the  cycle  length  better  matched  to  the  observed  average  cycle  length.  A  more 
rigorous  comparison  between  the  cyclical  behavior  of  a  fitted  model  and  that  of  the  data 
can  be  done  by  comparing  the  spectral  density  of  the  data  with  that  of  a  long  realization 
from  the  fitted  model.  Exhibit  15.18  plots  the  spectrum  of  the  data  using  a  modified 
Daniell  window  with  a  (3,3)  span.  Also  plotted  is  the  spectrum  of  the  fitted  TAR(2;1,4) 
model  (dashed  line)  and  that  of  the  fitted  TAR(2;1,1)  model  (dotted  line),  both  of  which 
are  based  on  a  simulated  realization  of  size  10,000,  a  modified  Daniell  window  with  a 
(200,200)  span,  and  10%  tapering.  It  can  be  seen  that  the  spectrum  of  the  TAR(2;1,4) 
model  follows  that  of  the  predator  series  quite  closely  and  is  slightly  better  than  the  sim¬ 
plified  TAR(2;  1,1)  model. 


Exhibit  15.18  Spectra  of  Log(predator)  Series,  Dashed  Line  for  TAR(2;1,1), 
Dotted  Line  for  TAR(2;1,4) 


>  set . seed (2357125 ) 

>  yy . 1 . 4=tar . sim (predator . tar . 1 , n= 10000 ) $y 

>  yy . l=tar . sim (predator . tar . 2 , n= 10000 ) $y 

>  spec . 1 . 4=spec (yy . 1 . 4 , taper= . 1 ,  span=c (2 00 , 2 00 ) , plot=F) 
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>  spec . l=spec (yy . 1 , taper= . 1 ,  span=c (200 , 200 ) , plot=F) 

>  spec .predator=spec (log (predator . eq) , taper= . 1 , 

span=c (3,3) , plot=F) 

>  spec .predator=spec (log (predator . eq) , taper= . 1 , span=c (3,3) , 

ylim=range (c (spec . 1 . 4$ spec, spec . l$spec , spec .predator$spec) ) ) 

>  lines (y=spec . 1 . 4$ spec , x=spec . 1 . 4$freq, lty=2) 

>  lines (y=spec . 1 $ spec , x= spec . l$f req, lty=3 ) 


We  note  that  the  conditional  least  squares  method  with  the  predator  data  yields  the 
same  threshold  estimate  for  cl  =  3  and  hence  also  the  other  parameter  estimates,  although 
this  need  not  always  be  the  case.  Finally,  a  couple  of  clarifying  remarks  on  the  predator 
series  analysis  are  in  order.  As  the  experimental  prey  series  is  also  available,  a  bivariate 
time  series  analysis  may  be  studied.  But  it  is  not  pursued  here  since  nonlinear  time 
series  analysis  with  multiple  time  series  is  not  a  well-charted  area.  Moreover,  real  bio¬ 
logical  data  are  often  observational,  and  abundance  data  of  the  prey  population  are  often 
much  noisier  than  those  of  the  predator  population  because  the  predator  population 
tends  to  be  fewer  in  number  than  the  prey  population.  Furthermore,  predators  may 
switch  from  their  favorite  prey  food  to  other  available  prey  species  when  the  former 
becomes  scarce,  rendering  a  more  complex  prey-predator  system.  For  example,  in  a 
good  year,  hares  may  be  seen  hopping  around  in  every  corner  in  the  neighborhood, 
whereas  it  is  rare  to  spot  a  lynx,  their  predator!  Thus,  biological  analysis  often  focuses 
on  the  abundance  data  of  the  predator  population.  Nonetheless,  univariate  time  series 
analysis  of  the  abundance  of  the  predator  species  may  shed  valuable  biological  insights 
on  the  prey-predator  interaction;  see  Stenseth  et  al.  (1998,  1999)  for  some  relevant  dis¬ 
cussion  on  a  panel  of  Canadian  lynx  series.  For  the  lynx  data,  a  TAR(2;2,2)  model  with 
delay  equal  to  2  is  the  prototypical  model,  with  delay  2  lending  some  nice  biological 
interpretations.  We  note  that,  for  the  predator  series,  delay  2  is  very  competitive;  see 
Exhibit  15.1 1,  and  hence  may  be  preferred  on  biological  grounds.  In  one  exercise,  we 
ask  the  reader  to  fit  a  TAR  model  for  the  predator  series  with  delay  set  to  2  and  interpret 
the  findings  by  making  use  of  the  framework  studied  in  Stenseth  et  al.  (1998,  1999). 

15.8  Model  Diagnostics 


In  Section  15.7,  we  introduced  some  model  diagnostic  techniques;  for  example,  skele¬ 
ton  analysis  and  simulation.  Here,  we  discuss  some  formal  statistical  approaches  to 
model  diagnostics  via  residual  analysis.  The  raw  residuals  are  defined  as  subtracting  the 
fitted  value  from  the  data,  where  the  rth  fitted  value  is  the  estimated  conditional  mean  of 
Yt  given  past  values  of  Fs;  that  is,  the  residuals  sf  are  given  by 

=  Yt-{ko  +  klYt-l  +  -+hpYt-p}I(Yt-^r) 

-{§2,0  +  §2AYt-X  +  -+hpYt-pWt-Z>^ 

These  are  the  same  as  the  raw  residuals  from  the  fitted  submodels.  The  standardized 
residuals  are  obtained  by  normalizing  the  raw  residuals  by  their  appropriate  standard 
deviations: 
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that  is,  raw  residuals  from  the  lower  (upper)  regime  are  normalized  by  the  noise  stan¬ 
dard  deviation  estimate  of  the  lower  (upper)  regime.  As  in  the  linear  case,  the  time  series 
plot  of  the  standardized  residuals  should  look  random,  as  they  should  be  approximately 
independent  and  identically  distributed  if  the  TAR  model  is  the  true  data  mechanism; 
that  is,  if  the  TAR  model  is  correctly  specified.  As  before,  we  look  for  the  presence  of 
outliers  and  any  systematic  pattern  in  such  a  plot,  in  which  case  it  may  provide  a  clue  for 
specifying  a  more  appropriate  model.  The  independence  assumption  of  the  standardized 
errors  can  be  checked  by  examining  the  sample  ACF  of  the  standardized  residuals.  Non¬ 
constant  variance  may  be  checked  by  examining  the  sample  ACF  of  the  squared  stan¬ 
dardized  residuals  or  that  of  the  absolute  standardized  residuals. 

Here,  we  consider  the  generalization  of  the  portmanteau  test  based  on  some  overall 
measure  of  the  magnitude  of  the  residual  autocorrelations.  The  reader  may  want  to 
review  the  discussion  in  Section  12.5  on  page  301,  where  we  explain  that  even  if  the 
model  is  correctly  specified,  the  residuals  are  generally  dependent  and  so  are  their  sam¬ 
ple  autocorrelations.  Unlike  the  case  of  linear  AR1MA  models,  the  dependence  of  the 
residuals  necessitates  the  employment  of  a  (complex)  quadratic  form  of  the  residual 
autocorrelations: 

mm 

Bm  =  "e  ffl  (15.8.3) 

i  =  1  j  =  1 

where  neff  =  n  -  ma x{pi,p2,d)  is  the  effective  sample  size,  p;.  the  ith-lag  sample  auto¬ 
correlation  of  the  standardized  residuals,  and  q/j  some  model-dependent  constants  given 
in  Appendix  L  on  page  421.  If  the  true  model  is  a  TAR  model,  p(  are  likely  close  to 
zero  and  so  is  Bm,  but  Bm  tends  to  be  large  if  the  model  specification  is  incorrect.  The 
quadratic  form  is  designed  so  that  Bm  is  approximately  distributed  as  '/}  with  m  degrees 
of  freedom.  Mathematical  theory  predicts  that  the  j2  distribution  approximation  is  gen¬ 
erally  more  accurate  with  larger  sample  size  and  relatively  small  m  as  compared  with 
the  sample  size. 

In  practice,  the  p-value  of  Bm  may  be  plotted  against  m  over  a  range  of  m  values  to 
provide  a  more  comprehensive  assessment  of  the  independence  assumption  on  the  stan¬ 
dardized  errors.  The  bottom  figure  of  Exhibit  15.19  reports  the  portmanteau  test  of  the 
TAR(2;1,1)  model  fitted  to  the  predator  series  discussed  earlier  for  1  <  m  <  12.  The  top 
figure  there  is  the  time  series  plot  of  the  standardized  residuals.  Except  for  a  possible 
outlier,  the  plot  shows  no  particular  pattern.  The  middle  figure  is  the  ACF  plot  of  the 
standardized  residuals.  The  confidence  band  is  based  on  the  simple  1.96 /Jn  rule  and 
should  be  regarded  as  a  rough  guide  on  the  significance  of  the  residual  ACF.  It  suggests 
that  the  lag  1  residual  autocorrelation  is  significant.  The  more  rigorous  portmanteau 
tests  are  all  significant  for  m  <  6,  suggesting  a  lack  of  fit  for  the  TAR(2;1,1)  model.  Sim¬ 
ilar  diagnostics  for  the  TAR(2;1,4)  model  are  shown  in  Exhibit  15.20.  Now,  the  only 
potential  problem  is  a  possible  outlier.  However,  the  fitted  model  changed  little  upon 
deleting  the  last  four  data  points,  including  the  potential  outlier;  hence  we  conclude  that 
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the  fitted  TAR(2;1,4)  model  is  fairly  robust.  Exhibit  15.21  displays  the  QQ  normal  score 
plot  of  the  standardized  residuals,  which  is  apparently  straight  and  hence  the  errors 
appear  to  be  normally  distributed.  In  summary,  the  fitted  TAR(2;1,4)  model  provides  a 
good  fit  to  the  predator  series. 

Exhibit  15.19  Model  Diagnostics  of  the  First-Order  TAR  Model:  Predator 
Series 


CO 

"D 


>  win . graph (width=4 . 875 , height=4 . 5 ) 

>  tsdiag (predator . tar . 2 , gof . lag=20) 


P-values  ACF  of  Residuals  Standardized  Residuals 
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>  tsdiag (predator . tar . 1 , gof . lag=20 ) 
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Exhibit  15.21  QQ  Normal  Plot  of  the  Standardized  Residuals 

Sample  Quantiles 

-2-10  1 

_ l _ l _ l _ l _ 

O 

i  i  i  i  i 

-2-10  1  2 

Theoretical  Quantiles 

>  win . graph (width=2 . 5 , height =2 . 5 , pointsize=8 ) 

>  qqnorm (predator . tar . l$std . res ) ;  qqline (predator .tar.l$std.res) 

15.9  Prediction 

In  this  section,  we  consider  the  problem  of  predicting  future  values  from  a  TAR  process. 
In  practice,  prediction  is  based  on  an  estimated  TAR  model.  But,  as  in  the  case  of 
ARIMA  models,  the  uncertainty  due  to  parameter  estimation  is  generally  small  com¬ 
pared  with  the  natural  variation  of  the  underlying  process.  So,  we  shall  proceed  below  as 
if  the  fitted  model  were  the  true  model.  The  uncertainty  of  a  future  value,  say  Yt  +  t,  is 
completely  characterized  by  its  conditional  probability  distribution  given  the  current  and 
past  data  Yt,  Yt_  j,...,  referred  to  as  the  ^-step-ahead  predictive  distribution  below.  For 
ARIMA  models  with  normal  errors,  all  predictive  distributions  are  normal,  which 
greatly  simplifies  the  computation  of  a  predictive  interval,  as  it  suffices  to  find  the  mean 
and  variance  of  the  predictive  distribution.  However,  for  nonlinear  models,  the  predic¬ 
tive  distributions  are  generally  nonnormal  and  often  intractable.  Hence,  a  prediction 
interval  may  have  to  be  computed  by  brute  force  via  simulation.  The  simulation 
approach  may  be  best  explained  in  the  context  of  a  first-order  nonlinear  autoregressive 
model: 


Yt+l=  h(Y„et+1)  (15.9.1) 

Given  Yt  =  yt,  Yt_  j  =  yt  _  j,. . . ,  we  have  Yt  +  j  =  h(yt, et  +  j)  so  a  realization  of  Yt  +  i  from 
the  one-step-ahead  predictive  distribution  can  be  obtained  by  drawing  et+  \  from  the 
error  distribution  and  computing  h(yt,et+  \ ).  Repeating  this  procedure  independently  B 
times,  say  1000  times,  we  get  a  random  sample  of  B  values  from  the  one-step-ahead  pre¬ 
dictive  distribution.  The  one-step-ahead  predictive  mean  may  be  estimated  by  the  sam¬ 
ple  mean  of  these  B  values.  However,  it  is  important  to  inspect  the  shape  of  the 
one-step-ahead  predictive  distribution  in  order  to  decide  how  best  to  summarize  the  pre¬ 
dictive  information.  For  example,  if  the  predictive  distribution  is  multimodal  or  very 
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skewed,  the  one-step-ahead  predictive  mean  need  not  be  an  appropriate  point  predictor. 
A  generally  useful  approach  is  to  construct  a  95%  prediction  interval  for  Yt  +  \ ;  for 
example,  the  interval  defined  by  the  2.5th  percentile  to  the  97.5th  percentile  of  the  simu¬ 
lated  B  values. 

The  simulation  approach  can  be  readily  extended  to  finding  the  ^-step-ahead  predic¬ 
tive  distribution  for  any  integer  l  >  2  by  iterating  the  nonlinear  autoregression. 


Tf+1  =  h(Yt,et+l) 

yt  +  2  =  h(Yt+vet  +  2) 

Yt  +  e~  h(Yt  +  e_  i,  et  +  ()’  , 


(15.9.2) 


where  Yt  =  yt  and  {et+  +  is  a  random  sample  of  t  values  drawn  from  the  error 

distribution.  This  procedure  may  be  repeated  B  times  to  yield  a  random  sample  from  the 
^-step-ahead  predictive  distribution,  with  which  we  can  compute  prediction  intervals  of 
Yt  +  (  or  any  other  predictive  summary  statistic. 

Indeed,  the  Muple  (Yt+  ,.,Ytve)  is  a  realization  from  the  joint  predictive  distribu¬ 
tion  of  the  first  ^-step-ahead  predictions.  So,  the  procedure  above  actually  yields  a  ran¬ 
dom  sample  of  B  vectors  from  the  joint  predictive  distribution  of  the  first  ^-step-ahead 
predictions. 

Henceforth  in  this  section,  we  focus  on  the  prediction  problem  when  the  true  model 
is  a  TAR  model.  Fortunately,  the  simulation  approach  is  not  needed  for  computing  the 
one-step-ahead  predictive  distribution  in  the  case  of  a  TAR  model.  To  see  this,  consider 
the  simple  case  of  a  first-order  TAR  model.  In  this  case,  Yt  +  j  is  known,  so  that  the 
regime  for  Yt  +  i  is  known.  If  Yt+  j  _d<  r,  then  Yt+  i  follows  the  AR(1)  model 

Yr+\  =  Ho  +  4>i,l7t  +  °lef+l  (15.9.3) 

Because  Yr  =  yt  is  fixed,  the  conditional  distribution  of  Yt  +  |  is  normal  with  mean  equal 
to  (j)  j  0  +  cj)  j  j yt  and  variance  oy  .  Similarly,  if  Yt  >  r,  Yt+  j  follows  the  AR(1)  model  of 
the  upper  regime  so  that,  conditionally,  it  is  normal  with  mean  0  +  (j)9  yyt  and  variance 
.  A  similar  argument  shows  that,  for  any  TAR  model,  the  one-step-ahead  predictive 
distribution  is  normal.  The  predictive  mean  is,  however,  a  piecewise  linear  function,  and 
the  predictive  standard  deviation  is  piecewise  constant. 

Similarly,  it  can  be  shown  that  if  C  <  d,  then  the  ^-step-ahead  predictive  distribution 
of  a  TAR  model  is  also  normal.  But  if  l>  d,  the  ^-step-ahead  predictive  distribution  is  no 
longer  normal.  The  problem  can  be  illustrated  in  the  simple  case  of  a  first-order  TAR 
model  with  d  =  1  and  l  =  2.  While  Yt+  ]  follows  a  fixed  linear  model  determined  by  the 
observed  value  of  Yt,  Yt  +  9  may  be  in  the  lower  or  upper  regime,  depending  on  the  ran¬ 
dom  value  of  Yt  +  j.  Suppose  that  y,  <  r.  Now,  Yt+  |  falls  in  the  lower  regime  if  Yt  ,  |  = 
CTj  et+  i  +  <h  o  +  l  Yt-Y  which  happens  with  probability/^,  =  Pr{<3\et+  j  +  ((q  0  +  \yt 
<  r)  and  in  which  case 
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Yt  +  2  ~  Clet  +  2  +  ‘t’l, 0  +  4*1,  \Yt  +  1 

2  (15.9.4) 

=  cs-1e'r  +  2  +  4*1,  1  +  4>1,  i4>i,o  +  4*1,  i.Vr  +  4*1,0 

which  is  a  normal  distribution  with  mean  equal  to  (Jq  Q  +  4*i  {yt  +  4* i  o  an(l  vari_ 
ance  CTj  +  (j>  j  |  aj  .  On  the  other  hand,  with  probability  1  -  pt,  Yt+  j  falls  in  the  upper 
regime,  in  which  case  the  conditional  distribution  of  Pf  +  2  is  normal  but  with  mean 
^2  1(4*1  0+4*1  i>’()  +  4*2  0  and  variance  ct4  +  4*2  iCTi  •  Therefore,  the  conditional  distribu¬ 
tion  of  yf+2  is  a  mixture  of  two  normal  distributions.  Note  that  the  mixture  probability  p, 
depends  on  yt.  In  particular,  the  higher-step-ahead  predictive  distributions  are  nonnor¬ 
mal  for  a  TAR  model  if  i  >  d,  and  so  we  have  to  resort  to  simulation  to  find  the  predictive 
distributions. 

As  an  example,  we  compute  the  prediction  intervals  for  the  logarithmically  trans¬ 
formed  predator  data  based  on  the  fitted  TAR(2;1,4)  model  with  d  =  3;  see  Exhibit 
15.22,  where  the  middle  dashed  line  is  the  median  of  the  predictive  distribution  and  the 
other  dashed  lines  are  the  2.5th  and  97.5th  percentiles  of  the  predictive  distribution. 


Exhibit  15.22  Prediction  of  the  Predator  Series 


t 

>  set . seed (2357125) 

>  win . graph (width=4 . 875 , height =2 . 5 , point size =8 ) 

>  pred . pr edat or =pr edict (predator . tar . 1 , n . ahead=6 0 , n . sim=10000 ) 

>  yy=ts (c ( log (predator . eq) , pred .predator$f it ) , f requency=2 , 

start=start (predator . eq) ) 

>  plot (yy , type= ' n ' , ylim= range (c (yy, pred . predator$pred . interval ) )  , 

ylab= ' Log  Predator  1 , xlab=expression (t ) ) 

>  lines (log (predator . eq) ) 

>  lines (window (yy,  start=end (predator . eq) +c ( 0 , 1) ), lty=2 ) 

>  lines (ts (pred . predator$pred . interval [2 , ]  , 

start=end (predator . eq) +c (0, 1) , freq=2) , lty=2) 

>  lines (ts (pred . predator$pred . interval [1,1, 

start=end (predator . eq) +c (0, 1) , freq=2) , lty=2) 
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The  simulation  size  here  is  10,000.  In  practice,  a  smaller  size  such  as  1000  may  be 
adequate.  The  median  of  the  predictive  distribution  can  serve  as  a  point  predictor. 
Notice  that  the  predictive  medians  display  the  cyclical  pattern  of  the  predator  data  ini¬ 
tially  and  then  approach  the  long-run  median  with  increasing  number  of  steps  ahead. 
Similarly,  the  predictive  intervals  approach  the  interval  defined  by  the  2.5th  and  97.5th 
percentiles  of  the  stationary  distribution  of  the  fitted  TAR  model.  However,  a  new  fea¬ 
ture  is  that  prediction  need  not  be  less  certain  with  increasing  number  of  steps  ahead,  as 
the  length  of  the  prediction  intervals  does  not  increase  monotonically  with  increasing 
number  of  steps  ahead;  see  Exhibit  15.23.  This  is  radically  different  from  the  case  of 
ARIMA  models,  for  which  the  prediction  variance  always  increases  with  the  number  of 
prediction  steps  ahead. 


Exhibit  15.23  Width  of  the  95%  Prediction  Intervals  Against  Lead  Time 


>  plot ( ts (apply (pred . predator$pred . interval , 2  , 
function (x) {x [2] -x [1] } ) ) , 
ylab= ' Length  of  Prediction  Intervals', 
xlab= ' Number  of  Steps  Ahead1) 


Recall  that,  for  the  TAR  model,  the  prediction  distribution  is  normal  if  and  only  if 
the  number  of  steps  ahead  l<d.  Exhibit  15.24  shows  the  QQ  normal  score  plot  of  the 
three-step-ahead  predictive  distribution,  which  is  fairly  straight.  On  the  other  hand,  the 
QQ  normal  score  plot  of  the  six-step-ahead  predictive  distribution  (Exhibit  15.25)  is 
consistent  with  nonnormality. 
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Exhibit  15.24  QQ  Normal  Plot  of  the  Three-Step-Ahead  Predictive 
Distribution 


Theoretical  Quantiles 

>  win . graph (width=2 . 5 , height =2 . 5 , point size =8 ) 

>  qqnorm (pred . predator$pred . matrix [ , 3] ) 

>  qqline (pred . predator$pred . matrix [, 3] ) 


Exhibit  15.25  QQ  Normal  Plot  of  the  Six-Step-Ahead  Predictive 
Distribution 


-4  -2  0  2  4 

Theoretical  Quantiles 


>  qqnorm (pred . predator$pred . matrix [ , 6]  ) 

>  qqline (pred . predator$pred . matrix [, 6]  ) 
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15.10  Summary 


In  this  chapter,  we  have  introduced  an  important  nonlinear  times  serie  model — the 
threshold  model.  We  have  shown  how  to  test  for  nonlinearity  and,  in  particular,  for 
threshold  nonlinearity.  We  then  proceeded  to  consider  the  estimation  of  the  unknown 
parameters  in  these  models  using  both  the  minimum  AIC  (MAIC)  criterion  and  the  con¬ 
ditional  least  squares  approach.  As  with  all  models,  we  learned  how  to  criticize  them 
through  various  model  diagnostics,  including  an  extended  portmanteau  test.  Finally,  we 
demonstrated  how  to  form  predictions  from  threshold  models,  including  the  calculation 
and  display  of  prediction  intervals.  Several  substantial  examples  were  used  to  illustrate 
the  methods  and  techniques  discussed. 


Exercises 


15.1  Fit  a  TAR  model  for  the  predator  series  with  delay  set  to  2,  and  interpret  the  find¬ 
ings  by  making  use  of  the  framework  studied  in  Stenseth  et  al.  (1998,  1999).  (You 
may  first  want  to  check  whether  or  not  their  framework  is  approximately  valid  for 
the  TAR  model.)  Also,  compare  the  fitted  model  with  the  TAR(2;1,4)  model  with 
delay  3  reported  in  the  text.  (The  data  file  is  named  veilleux.) 

15.2  Fit  a  TAR  model  to  the  square-root-transformed  relative  sunspot  data,  and  exam¬ 
ine  its  goodness  of  fit.  Interpret  the  fitted  TAR  model.  (The  data  file  is  named 
spots.) 

15.3  Predict  the  annual  relative  sunspot  numbers  for  ten  years  using  the  fitted  model 
obtained  in  Exercise  15.2.  Draw  the  prediction  intervals  and  the  predicted  medi¬ 
ans.  (The  data  file  is  named  spots.) 

15.4  Examine  the  long-run  behavior  of  the  skeleton  of  the  fitted  model  for  the  relative 
sunspot  data.  Is  the  fitted  model  likely  to  be  stationary?  Explain  your  answer. 

15.5  Simulate  a  series  of  size  1000  from  the  TAR  model  fitted  to  the  relative  sunspot 
data.  Compute  the  spectrum  of  the  simulated  realization  and  compare  it  with  the 
spectrum  of  the  data.  Does  the  fitted  model  capture  the  correlation  structure  of  the 
data? 

15.6  Draw  the  lagged  regression  plots  for  the  square-root-transformed  hare  data.  Is 
there  any  evidence  that  the  hare  data  are  nonlinear?  (The  data  file  is  named  hare.) 

15.7  Carry  out  formal  tests  (Keenan’s  test,  Tsay’s  test,  and  threshold  likelihood  ratio 
test)  for  nonlinearity  for  the  hare  data.  Is  the  hare  abundance  process  nonlinear? 
Explain  your  answer.  (The  data  file  is  named  hare.) 

15.8  Assuming  that  the  hare  data  are  nonlinear,  fit  a  TAR  model  to  the  hare  data  and 
examine  the  goodness  of  fit.  (The  data  file  is  named  hare.) 
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15.9  This  exercise  assumes  that  the  reader  is  familiar  with  Markov  chain  theory.  Con¬ 
sider  a  simple  TAR  model  that  is  piecewise  constant: 

Y  =  4,l,0  +  CTlgP  ifYt- l-r 

ho  +  a2et’  iiYt-\>r 

where  {et}  are  independent  standard  normal  random  variables.  Let  Rt  =  1  if  Y,<r 
and  2  otherwise,  which  is  a  Markov  chain. 

(a)  Find  the  transition  probability  matrix  of  Rt  and  its  stationary  distribution. 

(b)  Derive  the  stationary  distribution  of  { Y, } . 

(c)  Find  the  lag  1  autocovariance  of  the  TAR  process. 


Appendix  L:  The  Generalized  Portmanteau  Test  for  TAR 


The  basis  of  the  portmanteau  test  is  the  result  that,  if  the  TAR  model  is  correctly  speci¬ 
fied,  pj,  p2,  ...,  pmare  approximately  jointly  normally  distributed  with  zero  mean  and 
covariances  Cov(p.,  p.)  =  q..,  where  Q  is  an  mxm  matrix  whose  (i,j)  element  equals 
qtj  and  whose  formula  is  given  below;  See  Chan  (2008)  for  a  proof  of  this  result.  It  can 
be  shown  that  Q  =  I  -  UV  1 UT  where  I  is  an  m  x  nt  identity  matrix, 


et-l 

Up  Yr-Ur  Yt-pIpU-It),  Yt-xU-It),  Yt-P2U-It)l  - 

U  =  E 

et-2 

et  -m 

where  lt  -  I(Yt  _  d<  r),  the  expectation  of  a  matrix  is  taken  elementwise,  and 


V=  E 


Y  I 

t-Pyt 

(1  ~It) 

Yt-P P-1,) 


Ur  Yt-\Ip  ...,  Yt_pi(  1  -/,)] 
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These  expectations  can  be  approximated  by  sample  averages  computed  with  the  true 
errors  replaced  by  the  standardized  residuals  and  the  unknown  parameters  by  their  esti¬ 
mates.  For  example,  E{et_  \I(  Y,  (i  <  r)  }  can  be  approximated  by 

t  =  1 

A  A 

where  the  initial  standardized  residuals  e t  =  0  for  t  <  max(p\,p2,d  ). 


Appendix:  An  Introduction  to  R 

Introduction 


All  of  the  plots  and  numerical  output  displayed  in  this  book  were  produced  with  the  R 
software,  which  is  available  at  no  cost  from  the  R  Project  for  Statistical  Computing.  The 
software  is  available  under  the  terms  of  the  Free  Software  Foundation's  GNU  General 
Public  License  in  source  code  form.  It  runs  on  a  wide  variety  of  operating  systems, 
including  Windows,  Mac  OS,  UNIX,  and  similar  systems,  including  FreeBSD  and 
Linux.  R  is  a  language  and  environment  for  statistical  computing  and  graphics,  provides 
a  wide  variety  of  statistical  methods  (time  series  analysis,  linear  and  nonlinear  model¬ 
ing,  classical  statistical  tests,  and  so  forth)  and  graphical  techniques,  and  is  highly  exten¬ 
sible.  In  particular,  one  of  the  authors  (KSC)  has  produced  a  large  number  of  new  or 
enhanced  R  functions  specifically  tailored  to  the  methods  described  in  this  book.  They 
are  available  for  download  in  an  R  package  named  TSA  on  the  R  Project  Website  at 
www.r-project.org.  The  TSA  functions  are  listed  on  page  468. 

Important  references  for  learning  much  more  about  R  are  also  available  at  the 
R-Project  Website,  including  An  Introduction  to  R:  Notes  on  R,  a  Programming  Envi¬ 
ronment  for  Data  Analysis  and  Graphics.  Version  2.4.1  (2006-12-18),  by  W.  N.  Ven¬ 
ables,  D.  M.  Smith,  and  the  R  Development  Core  Team,  (2006),  and  R:  A  Language  and 
Environment  for  Statistical  Computing  Reference  Index,  Version  2.4.1  (2006-12-18),  by 
The  R  Development  Core  Team  (2006a). 

The  R  software  is  the  GNU  implementation  of  the  famed  S  language.  It  has  been 
under  active  development  by  the  R  team,  with  contributions  from  many  statisticians  all 
over  the  world.  R  has  become  a  versatile  and  powerful  platform  for  doing  statistical 
analysis.  We  shall  confine  our  discussion  to  the  Windows  version  of  R.  To  obtain  the 
software,  visit  the  Website  at  www.r-project.org.  Click  on  CRAN  on  the  left-side  of  the 
screen  under  Download.  Scroll  down  the  list  of  CRAN  Mirror  sites  and  click  on  one  of 
them  nearest  to  you  geographically.  Click  on  the  link  for  Windows  (or  Linux  or  MacOS 
X  as  appropriate)  and  click  on  the  link  named  base.  Finally,  click  on  the  link  labeled 
R-2.6.  l-win32.exe.  (This  file  indicates  release  2.6.1,  the  latest  available  release  as  of 
this  writing.  Newer  versions  come  out  frequently.)  Save  the  file  somewhere  convenient, 
for  example,  on  your  desktop.  When  the  download  finishes,  double-click  the  program 
icon  and  proceed  with  installing  the  software.  (The  discussion  that  follows  assumes  that 
you  accept  all  of  the  defaults  during  installation.)  At  the  end  of  this  appendix,  on 
page  468,  you  will  find  a  listing  and  brief  description  of  all  the  new  or  enhanced  func¬ 
tions  that  are  contained  in  the  TSA  package. 

Before  you  start  the  R  software  for  the  first  time,  you  should  create  a  folder  or 
directory,  say  Rwork,  to  hold  data  files  that  you  will  use  with  R  for  this  project  or 
course.  This  will  be  the  working  directory  whenever  you  use  R  for  this  particular  project 
or  course.  This  directory  is  to  contain  the  workspace,  a  file  that  contains  all  the 
objects  (variables  and  functions)  created  in  an  R  session.  You  should  create  separate 
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working  directories  for  different  projects  or  different  courses.^  After  R  is 
successfully  installed  on  your  computer,  there  will  be  an  R  shortcut  icon  on 
your  desktop.  If  you  have  created  your  working  directory,  start  R  by  clicking  yj  \ 
the  R  icon  (shown  at  the  right).  When  the  software  has  loaded,  you  will  have 
a  console  window  similar  to  the  one  shown  in  Exhibit  1  with  a  bottom  line  that  reads  > 
followed  by  a  large  rectangular  cursor  (probably  in  red).  This  is  the  R  prompt.  You  may 
enter  commands  at  this  prompt,  and  they  will  be  carried  out  when  you  press  the  Enter 
key.  Several  tasks  are  available  through  the  menus. 

The  first  task  is  to  save  your  workspace  in  the  working 
directory  you  created.  To  do  so,  select  the  File  menu  and 
then  click  on  the  choice  Save  workspace...  You  now 
may  either  browse  to  the  directory  Rwork  that  you  created 
(which  may  take  many  steps)  or  type  in  the  full  path  name;  for 
example  “C:\Documents  and  Settings\JoeStudent\ 

My  Documents\Coursel56\Rwork”.  If  your  working  direc¬ 
tory  is  on  a  USB  flash  drive  designated  as  drive  E,  you  might 
simply  enter  “E:Rwork”.  Click  OK,  and  from  this  point  on  in 
this  session,  R  will  use  the  folder  Rwork  as  its  working  direc¬ 
tory. 

You  exit  R  by  selecting  Exit  on  the  File  menu.  Every 
time  you  exit  R,  you  will  receive  a  message  as  to  whether  or 
not  to  Save  the  workspace  image.  Click  Yes  to  save 
the  workspace,  and  it  will  be  saved  in  your  current  working 
directory.  The  next  time  you  want  to  resume  work  on  that 
same  project,  simply  navigate  to  that  working  directory  and 
locate  the  R  icon  there  attached  to  the  file  named  .  RData.  If  you  double-click  this  icon, 
R  will  start  with  this  directory  already  selected  as  the  working  directory  and  you  can  get 
right  to  work  on  that  project.  Furthermore,  you  will  receive  the  message  [Previ¬ 
ously  saved  workspace  restored] . 

Exhibit  1  shows  a  possible  screen  display  after  you  have  started  R,  produced  two 
different  graphs,  and  worked  with  R  commands  in  a  script  window  using  the  R  editor. 
Numerical  results  in  R  are  displayed  in  the  console  window.  Commands  may  be  entered 
(keyed)  in  either  the  console  window  and  executed  immediately  or  (better)  in  a  script 
window  (the  R  editor)  and  then  submitted  to  be  run  in  R.  The  Menu  bar  and  buttons  will 
change  depending  on  which  window  is  currently  the  “focus.” 


Source  R  code... 
New  script 
Open  script... 
Display  file(s)... 

Load  Workspace... 


Save  Workspace... 


Load  History... 
Save  History... 

Change  dir... 

Print... 

Save  to  File... 


Exit 


1  If  you  work  in  a  shared  computer  lab,  check  with  the  lab  supervisor  for  information  about 
starting  R  and  about  where  you  may  save  your  work. 

+  If  you  neglected  to  create  a  working  directory  before  starting  R.  you  may  do  so  at  i - 

this  point.  Navigate  to  a  suitable  place,  click  the  Create  new  folder  button,  and 
create  the  folder  Rwork  now. 
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Exhibit  1  Windows  Graphical  User  Interface  for  the  R  Software 


Menu  bar  and  buttons 


2.S.0  (2007-04-23) 

Copyright  (C)  2007  The  R  Foundation  1 or  Statistical  Computing 
ISBN  3-900051-07-0 

R  is  free  software  and  comes  with  ABSOLUTELY  NO  WARRANTY. 

You  ore  welcome  to  redistribute  it  under  certain  conditions 
Type  'license!)  '  or  '  licence  ()  1  tor  distribution  details. 


Natural  language  support  but  running  In  an  English  locali 

R  is  a  collaborative  project  with  many  contributors. 

Type  ' contributors () '  for  more  information  and 
'citationO'  on  how  to  cite  R  or  R  packages  in  publication! 

Type  1  den»  ()  '  for  some  demos,  'help!)'  for  on-line  help,  oi 
' help .start O 1  for  an  HTML  brovser  interface  to  help. 

Type  'q()  '  to  quit  R. 


[Previously  saved  workspace  restored) 
>  library  (TSA) 


X 


C:\Documents  and  Settings\Jon\My  DocumentsVTime  Series  Analysis  2e\Rwork\chap1  .R  -  R  Editor 


|fl  Exhibit  l.i 

library  (TSA) 

win. graph (width-4. 875,  height-2 . 5, points lre-£ 
data(larain) 

Ip  lot  (larain,  y  lob- '  Inches' ,  xlab-  ■  Year'  .type- '  c 
I#  Exhibit  1.2 

n. graph (uldth=3,  helght=3,pomtsize=8) 

ot  (y-laram,  x=z lag  (larain)  ,ylab='  Inches'  , 


console  window 


(active)  graph  window 


< lab- ' Previous  Year  Inches'] 


n. graph (width-4 .875,  he ight-2 . S, points l ze=8) 
ita  (color) 

ot (color, y lab-" Color  Property' , xlab-' Batch' , type-' o' ) 
Exhibit  1.4 

n. graph (uidth=3 ,  helght-3, pointslze-8) 

|plot  (y-color,  x-z  lag  (color) ,  ylafo-'  Color  Property'  , 
lab-'Prevous  Batch  Color  Property1) 


(inactive)  graph  window 
—  script  window 


Packages 


Load  package... 


Set  CRAN  mirror... 


Select  repositories...  - 

Install  package (s)... 

Update  packages... 

Install  package (s)  from  local  zip  files... 


Packages 


Load  package... 

Set  CRAN  mirror... 
Select  repositories. . . 


Install  package(s)... 


Update  packages... 

Install  package(s)  from  local  zip  files. . . 


A  particularly  useful  feature  of  R  is  its  ease  of 
including  supplementary  tools  in  the  form  of 
libraries  or  packages.  For  example,  all  the 
datasets  and  the  new  or  enhanced  R  functions 
used  in  this  book  are  collected  into  a  package 
called  TSA  that  can  be  downloaded  and  installed 
in  R.  This  can  be  done  by  clicking  the  Packages 
menu  and  then  selecting  Set  CRAN  mirror. 
Again  select  a  mirror  site  that  is  closest  to  you 
geographically,  and  a  window  containing  the 
names  of  all  available  packages  will  pop  up. 

In  addition  to  our  TSA  package,  you  will 
need  to  install  packages  named  leaps,  locfit, 
MASS,  mgcv,  tseries,  and  uroot.  Click  the 
Packages  menu  once  more,  click  Install 
package  (s) ,  and  scroll  through  the  window. 
Hold  down  the  Ctrl  key  and  click  on  each  of  these 
seven  package  names.  When  you  have  all  seven 
selected,  click  OK,  and  they  will  be  installed  on 
your  system  by  R.  You  only  have  to  install  them 
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once  (but,  of  course,  they  may  be  updated  in  the  future  and  some  of  them  may  be  incor¬ 
porated  into  the  core  of  R  and  not  need  to  be  installed  separately). 

We  will  go  over  commands  selected  from  the  various  chapters  as  a  tutorial  for  R, 
but  before  delving  into  those,  we  first  present  an  overview  of  R.  R  is  an  object-oriented 
language.  The  two  main  objects  in  R  are  data  and  functions.  R  admits  many  data  struc¬ 
tures.  The  simplest  data  structure  is  a  vector  that  contains  raw  data.  To  create  a  data  vec¬ 
tor  named  Dat  containing,  say,  31,  4,  15,  and  93,  after  the  >  prompt  in  the  console 
window,  enter  the  following  command 

Dat=c (31,4, 15, 93) 

and  then  press  the  Enter  key.  The  equal  sign  symbol  signifies  assigning  the  object  on  its 
right-hand  side  to  the  object  on  its  left-hand  side.  The  expression  c(31,4,15,93) 
stands  for  concatenating  the  numbers  within  the  parentheses  to  make  a  vector.  So,  the 
command  creates  an  object  named  Dat  that  is  a  vector  containing  the  numbers  31,4, 
15,  and  93.  R  is  case-sensitive,  so  the  objects  named  Dat  and  DAt  are  different.  To 
reveal  the  contents  of  an  object,  simply  type  the  name  of  the  object  and  press  the  Enter 
key.  So,  typing  Dat  in  the  R  console  window  (and  pressing  the  Enter  key)  will  display 
the  contents  of  Dat.  If  you  subsequently  enter  DAt  at  the  R  prompt,  it  will  complain  by 
returning  an  error  message  saying  that  object  "DAt"  is  not  found.  The  name  of  an  object 
is  a  string  of  characters  that  may  contain  letters,  numerals,  and  the  period  sign,  but  the 
leading  character  is  required  to  be  a  letter. '  For  example,  Abe  12  3  .  a  is  a  valid  name  for 
an  R  object  but  12a  is  not.  R  has  some  useful  built-in  objects,  for  example  pi,  which 
contains  the  numerical  value  of  n  required  for  trigonometric  operations  such  as  comput¬ 
ing  the  area  of  a  circle. 

For  us,  the  most  useful  data  structure  is  a  time  series.  A  time  series  is  a  vector  with 
additional  information  on  the  epoch  of  the  first  datum  and  the  number  of  data  per  a  basic 
unit  of  time  interval.  For  example,  suppose  we  have  quarterly  data  starting  from  the  sec¬ 
ond  quarter  of  2006:  12,  31.  22,  24,  30.  This  time  series  can  be  created  as  follows: 

>  Dat2=ts (c (12 , 31, 22 , 24 , 30) ,  start=c ( 2 0 06 , 2 ) ,  frequency=4) 
Its  content  can  be  verified  by  the  command 

>  Dat2 

Qtrl  Qtr2  Qtr3  Qtr4 

2006  12  31  22 

2007  24  30 

Larger  datasets  already  in  a  data  file  (raw  data  separated  by  spaces,  tabs,  or  line  breaks) 
can  be  loaded  into  R  by  the  command 

>  Dat2=ts (scan ( ' f ilel ' ) ,  start=c (2006 , 2) ,  frequency=4) 
where  it  is  assumed  that  the  data  are  contained  in  the  file  named  f  ilel  in  the  same 
directory  where  you  start  up  R  (or  the  one  changed  into  via  the  change  dir  com¬ 
mand).  Notice  that  the  file  name,  f  ilel,  is  surrounded  by  single  quotes  ( ' ).  In  R,  all 


'  Certain  names  should  be  avoided,  as  they  have  special  meanings  in  R.  For  example,  the  let¬ 
ter  T  is  short  for  true,  F  for  false,  and  c  for  concatenate  or  combine. 
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character  variables  must  be  so  enclosed.  You  may,  however,  use  either  single  quotes  or 
double  quotes  (")  as  long  as  you  use  them  in  pairs. 

Datasets  with  several  variables  may  be  read  into  R  by  the  read .  table  function. 
The  data  must  be  stored  in  a  table  form:  The  first  row  contains  the  variable  names,  and 
starting  from  the  second  line,  the  data  are  stored  so  that  data  from  each  case  make  up  a 
row  in  the  order  of  the  variable  names.  The  relevant  command  is 

Dat3=read .table('file2', header =T) 

where  f  ile2  is  the  name  of  the  file  containing  the  data.  The  argument  header=T 
specifies  that  the  variable  names  are  in  the  first  line  of  the  file.  For  example,  let  the  con¬ 
tents  of  a  file  named  f  ile2  in  your  working  directory  be  as  follows: 

Y  X 
1  2 

3  7 

4  8 

5  9 

>  Dat3=read. table ( 1 f ile2 1 , header=T) 

>  Dat3 

Y  X 
112 

2  3  7 

3  4  8 

4  5  9 

Note  that  in  displaying  Dat3,  R  adds  the  row  labels,  defaulted  to  be  from  1  to  the  num¬ 
ber  of  data  cases.  The  output  of  read .  table  is  a  data  .  frame,  which  is  a  data 
structure  for  a  table  of  data.  More  discussion  on  data  .  frame  can  be  found  below. 
Presently,  it  suffices  to  remember  that  the  variables  inside  a  data .  frame  are  not 
accessible.  Think  of  Dat3  as  a  closed  suitcase.  It  has  to  be  opened  before  its  variables 
are  accessible  in  an  R  session.  The  command  to  “open”  a  data  .  frame  is  to  attach 
it: 

>  Y 

Error:  object  "Y"  not  found 

>  attach (Dat3) 

>  Y 

[1]  13  4  5 

>  X 

[1]  2  7  8  9 

R  can  also  read  in  data  from  an  Excel  file  saved  in  the  csv  ( comma-separated  values ) 
format,  with  the  first  row  containing  the  variable  names.  Suppose  f  ile2  .csv  contains 
a  spreadsheet  containing  the  same  information  as  in  f  ile2.  The  commands  for  reading 
in  the  data  from  f  ile2  .csv  are  similar  to  the  one  for  a  text  file. 

>  Dat4=read .csv('file2.csv', header =T) 

>  Dat4 

Y  X 
112 

2  3  7 

3  4  8 

4  5  9 
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The  functions  scan,  read,  table,  and  read,  csv  have  many  other  useful  options. 
Use  R  Help  to  learn  more  about  them.  For  example,  run  the  command  ?read .  table, 
and  a  window  showing  detailed  information  for  the  read .  table  command  will  open. 
Remember  that  prefacing  the  question  mark  to  any  function  name  will  display  the  func¬ 
tion's  details  in  a  new  Help  window. 

Functions  in  R  are  similar  to  functions  in  the  programming  language  C.  A  function 
is  invoked  by  typing  its  name  followed  by  a  list  of  arguments  enclosed  by  parentheses. 
For  example,  the  concatenate  function  has  the  name  “c”  and  its  purpose  is  to  create  a 
vector  obtained  by  concatenating  the  arguments  supplied  to  the  function. 

>  c  (12 , 31, 22 , 24 , 30) 

Note  that  there  can  be  no  space  between  the  left  parenthesis  and  the  function  name. 
Even  if  the  argument  list  is  empty,  the  parentheses  must  be  included  in  invoking  a  func¬ 
tion.  Try  the  command 

>  c 

R  now  sees  the  name  of  an  object  and  will  simply  display  its  contents  by  printing  the 
entire  set  of  commands  making  up  the  function  in  the  console  window.  R  has  many  use¬ 
ful  built-in  functions,  including  abs,  log,  loglO,  exp,  sin,  cos,  sqrt,  and  so 
forth,  that  are  useful  for  manipulating  data.  (The  function  abs  computes  the  absolute 
value;  log  does  the  log-transformation  with  base  e,  while  loglO  uses  base  10;  exp  is 
the  exponentiation  function,  sin  and  cos  are  the  trigonometric  functions;  and  sqrt 
computes  the  square  root.)  These  functions  are  applied  to  a  vector  or  a  time  series  ele¬ 
ment  by  element.  For  example,  log  (Dat2 )  log-transforms  each  element  of  the  time 
series  Dat2  and  transfers  the  time  series  structure  to  the  transformed  data. 

>  Dat2=ts (c (12, 31,22,24, 30) ,  start=c ( 2 0 06 , 2 ) ,  frequency=4) 

>  log(Dat2) 

Qtrl  Qtr2  Qtr3  Qtr4 

2006  2.484907  3.433987  3.091042 

2007  3.178054  3.401197 

Furthermore,  vectors  and  time  series  can  be  manipulated  algebraically  with  the  usual 
addition  (+),  subtraction  (-),  multiplication  (*),  division  (/),  or  power  or  *  *)  carried 
out  element  by  element.  For  example,  applying  the  transformation  y  =  2xA3  -  x  +  7  to 
Dat2  and  saving  the  transformed  data  to  a  new  time  series  named  new .  Dat2  can  be 
easily  carried  out  by  the  command 

new.Dat2=  2*Dat2^3-Dat2+7 
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Now,  we  are  ready  to  check  out  selected  R  commands  used  in 
Chapter  1  of  the  book.  Script  files  of  the  commands  used  in 
each  of  the  fifteen  chapters  are  available  for  download  at 
www.stat.uiowa.edu/~kchan/TSA.htm.  The  script  files  contain 
the  R  commands  needed  to  carry  out  the  analyses  shown  in  the 
chapters.  They  also  contain  a  limited  amount  of  additional 
explanation.  Download  the  scripts  and  save  them  in  your  work¬ 
ing  directory.  You  may  then  open  them  within  R  in  an  R  editor 
(script)  window  and  you  will  save  much  typing!  Once  they  are 


downloaded,  script  files  may  be  opened  by  either  clicking  the 
open  file  button  ^  or  by  using  the  file  menu  shown  at  the 


Exhibit  2  A  Script  Window  with  Chapter  1  Scripts  Displayed 


f^CADocument^n^ettingsUor^roeAM' 


_ryer\My  DocumentsYTime 


Series  Ai 


nalysis  . 


x 


#  Exhibit  1.1 


library (TSA) 
win. graph (width=4 .875,  height=2 . 5,pointsize=8) 

data  (larair. ) 

plot (larain, ylab= ' Inches ' , xlab= ' Year ' , type= 'o') 


#  Exhibit  1.2 

win. graph (width=3,  height=3,pointsize=8) 

plot (y=larain, x=zlag (larain) , ylab=' Inches  1 , xlab=' Previous  Year  Inche£ 

#  Exhibit  1.3 

win. graph (width=4 . 875,  height=2 . 5,pointsize=8) 
data (color) 

plot  (color,  ylab=' Color  Property 1 , xlab=' Batch' ,  type='  o ' )  |< 

<  I  hi  > 


Undo 

Ctrl+Z 

Cut 

Ctrl+X 

Copy 

Ctrl+C 

Paste 

Ctrl+V 

Delete 

Select  all 

Ctrl+A 

Exhibit  2  shows  a  portion  of  the  script  file  for  Chapter  1 
in  a  script  window.  The  first  four  commands  have  been 
highlighted  by  dragging  the  mouse  pointer  across  them. 
They  can  now  all  be  executed  by  either  pressing  Con- 
trol-R  (Ctrl-R)  or  by  right-clicking  the  highlighted  group 
and  choosing  Run  from  the  choices  displayed,  as  shown 
at  the  left.  If  the  cursor  is  in  a  single  command  line  with 
no  highlighting,  that  one  command  may  be  executed 
similarly. 
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At  the  beginning  of  each  session  with  R,  you  need  to  load  the  TSA  library.  The  fol¬ 
lowing  command  will  accomplish  this  (but  you  may  wish  to  investigate  the  .  First 
function  that  can  automate  some  startup  tasks). 

library (TSA) 

The  TSA  package  contains  all  datasets  and  functions  needed  for  repeating  the  analyses 
and  doing  the  exercises. 

#  Exhibit  1.1  on  page  2. 

win . graph (width=4 . 875 , height =2 . 5 , pointsize=8 ) 

Comments  may  be  interspersed  in  the  R  codes  to  improve  their  readability.  The  #  sign 
in  a  R  command  signifies  that  what  follows  the  sign  are  comments,  and  hence  ignored 
by  R.  The  first  R  command  opening  with  the  #  sign  is  therefore  a  comment.  The  second 
R  command  opens  a  window  for  graphics  that  is  4.875  inches  wide  and  2.5  inches  tall 
with  characters  printed  with  point  size  8.  The  chosen  setting  and  similar  settings  pro¬ 
duce  time  sequence  plots  that  are  appropriate  for  inclusion  in  the  book.  Other  settings 
will  be  appropriate  for  other  purposes.  For  example,  quantile-quantile  plots  are  best 
viewed  with  a  1:1  aspect  ratio  (height  =  width).  For  exploratory  data  analysis,  you  will 
want  larger  graphics  windows  to  use  the  full  resolution  of  your  computer  screen  to  see 
more  detail.  The  command  win  .  graph  can  be  safely  omitted  altogether.  If  there  is 
currently  no  open  graphics  window,  R  will  open  a  graphics  window  whenever  a  graph¬ 
ics  command  is  issued.  You  can  resize  this  window  in  the  usual  ways  by  dragging  edges 
or  corners. 

data (larain) 

This  loads  the  time  series  larain  into  the  R  session  and  makes  it  available  for  further 
analysis  such  as 

plot (larain, ylab= ' Inches ' , xlab= ' Year  1 , type= ’o') 

Plot  is  a  function.  It  draws  the  time  sequence  plot  for  larain.  The  argument 
ylab=  '  Inches  1  specifies  “Inches”  as  the  label  for  the  y-axis.  Similarly,  the  label  for 
the  .r-axis  is  “Year.”  The  argument  type  indicates  how  the  data  are  displayed  in  the 
plot.  For  type=  'o',  the  individual  data  points  are  overplotted  on  the  curve; 
type=  '  b  '  (for  both)  is  another  option  that  superimposes  the  data  points  on  the  curve, 
but  with  the  curve  broken  around  the  data  points.  For  type=  '  1 ' ,  only  the  line  seg¬ 
ments  connecting  the  points  are  shown.  (Note:  This  character  (1)  is  an  “el,”  not  a  one.) 
To  show  only  the  data  points ,  supply  the  argument  type=  1  p  ' .  To  learn  more  about  the 
plot  function  and  the  full  options  for  the  type  argument,  run  the  command 
?plot 

A  Help  window  on  the  plot  function  will  then  pop  up  for  your  browsing.  Try  it  now. 
What  will  be  plotted  if  the  option  type=  '  h '  is  used  instead  of  type=  'o'?  All 
graphs  may  be  saved  (File  >  Save  as  >  ...)  in  any  of  several  graphics  formats:  jpeg,  pdf, 
etc.  Saved  graphs  may  then  be  imported  into  most  word-processing  programs  to  create 
high-quality  reports. 

#  Exhibit  1.2  on  page  2. 

win . graph (width=3 , height=3 , pointsize=8 ) 
plot (y= larain, x=zlag (larain)  , ylab= 1  Inches ' , 
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xlab= ' Previous  Year  Inches ' ) 

The  plot  function  is  a  multipurpose  function.  It  can  do  many  different  kinds  of  plots, 
depending  on  the  set  of  arguments  passed  to  it  and  their  attributes.  Here,  it  draws  the 
scatter  diagram  of  larain  against  its  lag  1  values  through  the  arguments  y=larain 
(that  is,  larain  on  the  y-axis)  and  x=zlag  (larain)  (that  is,  the  lag  1  of  larain 
is  on  the  x-axis).  Note  that  z lag  is  a  function  in  the  TSA  package.  Run  the  command 
?zlag  to  learn  what  you  can  do  with  it. 

#  Exhibit  1.3  on  page  3. 
data (color) 

plot (color , ylab= 1  Color  Property ' , xlab= ' Batch ' , type= 'o') 

Here  we  have  supplied  four  arguments  to  the  plot  function  to  draw  the  time  sequence 
plot  of  the  time  series  color.  The  first  argument  is  simply  color,  but  the  other  sup¬ 
plied  arguments  are  of  the  form  name  of  the  argument  =  argument  value  so  the 
first  supplied  argument  is  an  unnamed  argument,  while  the  other  arguments  are  named 
arguments.  You  may  wonder  how  an  unnamed  argument  is  interpreted  by  R.  To  under¬ 
stand  this,  use  the  ?plot  command  to  check  that  the  argument  list  of  the  plot  func¬ 
tion  is  x,  y,  and  ....  You  may  guess  that  the  x  argument  represents  the  x- variable,  and 
the  y  argument  for  the  y- variable  in  a  plot.  The  ellipsis  (...)  argument  stands  for  all  other 
allowable  arguments,  which  must,  however,  be  specified  with  the  name  of  the  argument. 
(Again,  consult  the  pages  of  the  plot  function  to  figure  out  which  other  arguments 
besides  x  and  y  may  be  passed  to  plot.)  Any  unnamed  argument  is  interpreted  to  be 
the  value  for  the  argument  whose  order  matches  that  of  the  unnamed  argument  supplied 
to  the  function.  For  example,  color  appears  as  the  first  argument  supplied  to  the  plot 
function,  so  R  interprets  it  as  the  value  for  the  x  argument.  Now  there  is  no  value  sup¬ 
plied  to  the  y  argument.  In  this  case,  plot  will  examine  the  nature  of  the  x- variable  to 
determine  what  actions  to  be  taken.  Since  color  is  a  time  series,  plot  draws  a  time 
sequence  plot  of  color.  To  reinforce  understanding,  now  try  the  following  command 
in  which  color  appears  twice  in  the  argument  list,  as  the  first  and  second  arguments. 

plot (color,  color,  ylab= ' Color  Property', 
xlab= ' Batch ' , type= ’o') 

Guess  what  will  be  drawn  by  R?  Now,  color  is  interpreted  as  the  x-variable  and  also 
the  y- variable;  hence  a  45  degree  line  is  drawn.  However,  the  line  seems  to  be  of  nonuni¬ 
form  thickness.  (Can  you  see  this?)  Why?  It  is  because  seeing  that  the  variables  are  time 
series,  plot  draws  the  line  by  connecting  data  points  in  the  order  they  are  recorded, 
with  the  order  of  the  data  points  marked  in  the  plot.  This  feature  can  be  useful  in  some 
analyses  but  in  this  case  this  feature  is  distracting.  A  remedy  is  to  strip  the  time  series 
attribute  from  the  x- variables  before  plotting.  (Plot  takes  the  clue  of  how  to  do  the  plot 
from  the  attribute  of  the  x-variable.)  To  temporarily  turn  color  into  a  raw  data  vector, 
use  the  command 

as .vector (color) 

Now,  try  the  command 

plot (as .vector (color) ,  color,  ylab= ' Color  Property', 
xlab= ' Batch ' , type= 'o') 
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#  Exhibit  1.4  on  page  4. 

plot (y=color, x=z lag (color)  , ylab= 'Color  Property 1 , 
xlab= ' Previous  Batch  Color  Property') 

The  zlag  function  outputs  an  ordinary  vector;  that  is,  zlag  (color)  is  the  lag  1  of 
color,  but  with  its  time  series  attribute  stripped. 

#  Exhibit  1.9  on  page  7. 
plot(oilfilters, type= ' 1 ' , ylab= 'Sales' ) 

Plot  is  a  high-level  graphics  function  and,  as  such,  it  will  replace  what  is  currently  in 
the  graphics  window  or  create  a  new  graphics  window  if  none  exists.  Recall  that  the 
argument  type=  '  1 '  instructs  plot  to  just  draw  the  line  segments  connecting  the 
individual  time  series  points. 

Month=c ( ' J ' , 'A1, 'S', 'O', ' N ' , ' D ' , ' J ' , ' F ' , ' M ' , 'A', ' M ' , ' J ' ) 
creates  a  vector  named  Month  that  contains  12  elements  that  represent  the  12  months  of 
the  year  beginning  with  July. 

points (oilf ilters , pch=Month) 

Points  is  a  low-level  graphics  function  that  draws  on  top  of  an  existing  graph.  Since 
oilf  ilters  is  a  time  series,  points  plots  oilf  ilters  against  time  order,  but  the 
argument  pch=Month  instructs  the  points  function  to  plot  the  data  points  using  the 
successive  values  of  the  Month  vector  as  plotting  symbols.  So,  the  first  point  plotted  is 
plotted  as  a  J,  the  second  as  an  A,  and  so  forth.  When  the  values  of  Month  are  used  up, 
they  are  recycled;  think  of  Month  being  replicated  as  Month,  Month,  Month,...,  to 
make  up  any  deficiency.  So,  the  13th  data  point  is  plotted  as  a  J  and  the  14th  as  an  A. 
What  letter  is  used  for  the  30th  data  point? 

Alternatively,  the  exhibit  can  be  reproduced  by  the  following  commands 

plot(oilfilters, type= ' 1 ' , ylab= 'Sales' ) 
points (y=oilf ilters, x=time (oilfilters)  , 
pch=as .vector (season (oilf ilters) ) ) 

The  t  ime  function  outputs  the  epochs  when  the  time  series  values  were  collected.  The 
season  function  returns  the  month  of  the  data  in  oilfilters;  season  is  a  smart 
function,  as  it  returns  the  quarter  of  the  data  for  quarterly  data  and  so  forth.  The  pch 
argument  expects  a  vector  as  its  value,  but  the  output  of  the  season  function  has  been 
designed  to  be  a  factor  object;  hence  the  application  of  the  as  .vector  function  to 
season  (oilfilters)  strips  its  factor  attribute.  (See  more  about  factor  objects 
on  page  435.) 

A  good  way  to  appreciate  the  natural  variation  in  a  stochastic  process  is  draw  real¬ 
izations  from  the  process  and  plot  them  in  a  time  sequence  plot.  For  example,  the  inde¬ 
pendent  and  identically  normally  distributed  process  is  often  used  as  a  data  generating 
mechanism  for  completely  random  data;  that  is,  data  with  no  temporal  structure.  In 
other  words,  such  data  constitute  a  random  sample  from  a  normal  distribution  that  are 
drawn  sequentially  over  time.  Simulating  data  from  such  a  process  and  viewing  their 
time  sequence  plots  is  a  valuable  exercise  that  can  train  our  eyes  to  differentiate  whether 
a  time  series  is  random  or  dependent  over  time,  c.f.  Exercise  1.3.  The  R  command  for 
simulating  and  storing  in  a  variable  named  y  a  random  sample  of  size,  say  n  =  48,  from 
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a  standard  normal  distribution  is 
y=rnorm (48 ) 

The  data  can  then  be  plotted  using  the  command 
plot (y,  type='p',  ylab='IID  Normal  Data') 

Try  the  type=  'o'  option  in  the  above  command.  Which  plotting  option  do  you  find 
better  to  see  the  randomness  in  the  data?  Notice  that  executing  the  command 
y=rnorm  (48)  again  will  yield  a  different  time  series  realization  of  the  random  pro¬ 
cess.  The  set .  seed  command  discussed  below  addresses  the  issue  of  how  to  make 
simulations  in  R  “reproducible.” 

Data  can  be  simulated  from  other  distributions.  For  example,  the  command 
rt  (n=48  ,  df  =  5 )  simulates  48  independent  observations  from  a  f-distribution  with  5 
degrees  of  freedom.  Similarly,  rchisq  (n=4  8  ,  df  =  2 )  simulates  a  realization  of  size 
48  from  the  chi-square  distribution  with  2  degrees  of  freedom. 
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We  show  some  R  code  to  simulate  your  own  random  walk  with,  say,  60  independent 
standard  normal  errors. 

#  Exhibit  2.1  on  page  14 . 
n=6  0 

This  assigns  the  value  of  60  to  the  object  named  n. 
set . seed (12345) 

This  initializes  the  random  number  generator  so  that  the  simulation  is  reproducible  if 
needed. 

sim . random . walk=ts (cumsum (rnorm (n) ) , f req=l , start =1) 

The  expression  rnorm  (n)  generates  n  independent  values  from  the  standard  normal 
distribution.  The  function  cumsum  then  computes  the  vector  of  cumulative  sums  of  the 
normally  distributed  sample,  resulting  in  a  random  walk  realization.  The  random  walk 
realization  is  then  given  the  attribute  of  a  time  series  and  saved  into  the  object  named 
sim . random . walk. 

plot ( sim . random. walk, type= ' o ', ylab= ' Another  Random  Walk') 
plots  the  simulated  random  walk. 
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We  now  move  to  discuss  some  of  the  R  commands  appearing  in  Chapter  3. 

#  Exhibit  3.1  on  page  31. 
data (rwalk) 

This  command  loads  the  time  series  rwalk,  which  is  a  random  walk  realization. 
modell=lm (rwalk-time (rwalk) ) 
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The  function  lm  fits  a  linear  model  (a  regression  model)  with  its  first  argument  being  a 
formula.  A  formula  is  an  expression  including  a  tilde  sign  (~),  the  left-hand  side  of 
which  is  the  response  variable  and  the  right-hand  side  are  the  covariates  or  explanatory 
variables  (separated  by  plus  signs  if  there  are  two  or  more  covariates).  By  default,  the 
intercept  term  is  included  in  the  model.  The  intercept  can  be  removed  by  including  the 
term  1”  on  the  right-hand  side  of  the  tilde  sign.  Recall  that  time  (rwalk)  yields  a 
time  series  of  the  time  epochs  at  which  the  random  walk  was  sampled.  So  the  command 
lm  (rwalk-time  (rwalk)  )  fits  a  time  trend  regression  model  to  the  rwalk  series. 
The  model  fit  is  saved  as  the  object  named  model  1. 

summary (model 1) 

The  function  summary  prints  out  a  summary  of  the  fitted  model  passed  to  it.  Hence  the 
command  above  prints  out  the  fitted  time  trend  regression  model  for  rwalk. 

#  Exhibit  3.2  on  page  31. 
plot (rwalk, type= 'o' , ylab= ' y ' ) 
abline (modell) 

The  function  abline  is  a  low-level  graphics  function.  If  a  fitted  simple  regression 
model  is  passed  to  it,  it  adds  the  fitted  straight  line  to  an  existing  graph.  Any  straight  line 
of  the  form  y  =  p0  +  P  \x  can  be  superimposed  on  the  graph  by  running  the  command 

abline (a=betaO , b=betal) 

For  example,  the  following  command  adds  a  45  degree  line  on  the  current  graph, 
abline (a=0,b=l) 

Recall  the  lm  function  can  fit  multiple  regression  models,  with  the  covariates  or 
explanatory  variables  specified  one  by  one,  on  the  right  side  of  the  tilde  sign  (~)  in  the 
formula.  The  covariates  must  be  separated  with  a  plus  sign  (+).  Suppose  we  want  to  fit  a 
quadratic  time  trend  model  to  the  rwalk  series.  We  need  to  create  a  new  covariate  that 
contains  the  square  of  the  time  indices.  The  quadratic  variable  may  be  created  before 
invoking  the  lm  function.  Or  it  may  be  created  on  the  fly  when  invoking  the  lm  func¬ 
tion.  The  latter  approach  is  illustrated  here. 

model la=lm (rwalk-time (rwalk) +1 (time (rwalk) ^2 ) ) 

Notice  that  the  expression  time  (rwalk)  ^2  is  enclosed  within  the  I  function  which 
instructs  R  to  create  a  new  variable  by  executing  the  command  passed  into  the  I  func¬ 
tion.  The  fitted  quadratic  trend  model  can  be  inspected  with  the  summary  function. 


>  summary (model la) 

Call: 

lm(formula  =  rwalk  -  time (rwalk)  +  I (time (rwalk) ^2 ) ) 
Residuals : 

Min  IQ  Median  3Q  Max 

-2.696232  -0.768018  0.008256  0.853365  2.344685 


Coefficients : 

(Intercept) 
time ( rwalk) 

I (time (rwalk) ^2 ) 


Estimate 
-1.4272911 
0 . 1746746 
-0 . 0006654 


Std.  Error 
0.4534893 
0 . 0343028 
0 . 0005451 


t  value  Pr ( > | t | ) 
-3.147  0.00262  ** 

5 . 092  4 . 16e-06  *** 
-1.221  0.22721 
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Signif.  codes:  0  '***'  0.001  '**'  0.01  0.05  0.1  '  '  1 

Residual  standard  error:  1.132  on  57  degrees  of  freedom 
Multiple  R-Squared:  0.8167,  Adjusted  R-squared:  0.8102 
F-statistic:  127  on  2  and  57  DF,  p-value:  <  2.2e-16 

The  summary  function  repeats  the  function  call  to  the  lm  function.  It  then  prints 
out  the  five-number  numerical  summary  of  the  residuals,  followed  by  a  table  of  the 
parameter  estimates  with  their  standard  errors,  f-values  and  p-values.  All  significant 
covariates  are  marked  with  asterisks  (*);  more  asterisks  means  higher  significance,  that 
is,  smaller  p-value,  as  explained  in  the  line  labeled  as  Signif.  codes.  Finally,  it  outputs 
the  residual  standard  error,  that  is,  the  noise  standard  deviation  estimate,  and  the  multi¬ 
ple  R-squared  of  the  fitted  model.  Clearly,  the  quadratic  term  is  not  significant  so  that  it 
is  not  needed,  as  is  also  obvious  from  the  time  plot  of  the  series. 

The  reader  may  wonder  why  the  I  function  is  needed.  This  is  because  without  the  I 
function,  R  interprets  the  term  time  (rwalk)  +time  (rwalk)  ^2  using  the  formula 
convention  (run  ? formula  to  learn  more  about  the  formula  convention),  which  results 
in  fitting  the  linear  trend  model!  Refit  the  quadratic  trend  model  but  now  omit  the  I 
function  in  the  R  command,  and  compare  the  model  fit  with  those  of  the  linear  and  qua¬ 
dratic  trend  models. 

#  Exhibit  3.3  on  page  32. 
data (tempdub) 

This  loads  the  tempdub  series.  You  can  learn  more  about  the  dataset  tempdub  by  run¬ 
ning  the  command  ?tempdub. 

month . =season (tempdub) 

The  expression  season  (tempdub)  outputs  the  monthly  index  of  tempdub  as  a 
factor,  and  saves  it  into  the  object  month..  The  first  period  sign  (.)  is  part  of  the 
name  (month . )  and  is  included  to  make  the  printout  from  later  commands  more  clear. 

We  now  digress  to  explain  what  a  factor  is.  A  factor  is  a  kind  of  data  structure 
for  handling  qualitative  (nominal)  data  that  do  not  have  a  natural  ordering  like  numbers 
do.  Flowever,  for  purposes  of  summary  and  graphics,  the  user  may  supply  the  levels 
argument  to  indicate  an  ordering  among  the  factor  values.  For  example,  the  following 
command  creates  a  factor  containing  the  qualitative  variable  sex,  with  the  default 
ordering  using  the  dictionary  order. 

>  sex=f actor (c ( ' M ' , ’ F ’ , 'M', 'M', ’ F 1 ) ) 

>  sex 

[1]  M  F  M  M  F 
Levels:  F  M 

We  can  change  the  ordering  as  follows: 

>  sex=f actor (c ( ’M1 , ’F' , ’M1 , 'M1 , 'F1 ) , levels=c ( 1 M ■ , 1 F 1 ) ) 

>  sex 

[1]  M  F  M  M  F 
Levels:  M  F 

Note  the  swap  of  F  and  M  in  the  levels.  The  function  table  counts  the  frequencies  of 
the  two  sexes. 
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>  table (sex) 
sex  M  F 
3  2 

The  printout  lists  the  frequencies  of  the  values  according  to  the  order  supplied  in  the 
level  argument.  Now,  we  return  to  the  R  scripts  in  Chapter  3. 

model2=lm (tempdub -month . - 1 ) 

Recall  that  month  is  a  factor  containing  the  month  of  the  data.  When  a  formula  con¬ 
tains  a  factor  covariate,  the  function  lm  replaces  the  factor  variable  by  a  set  of  indicator 
variables  corresponding  to  each  distinct  level  (value)  of  the  factor.  Here,  month .  has 
12  distinct  levels:  Jan,  Feb,...,  and  so  forth.  So,  in  place  of  month.,  lm  creates  12 
monthly  indicator  variables  and  replaces  month,  by  the  12  indicator  variables. 
Because  these  12  indicator  variables  are  linearly  dependent  (they  add  up  to  a  vector  of 
all  ones),  the  intercept  term  has  to  be  removed  to  avoid  multicollinearity.  The  expression 
“-1”  in  the  formula  takes  care  of  this.  The  fitted  model  corresponds  to  fitting  a  mean 
separately  for  each  month.  If  the  expression  “-1”  is  omitted,  lm  deals  with  the  multi¬ 
collinearity  by  omitting  the  first  indicator  variable;  that  is,  the  indicator  variable  for  Jan¬ 
uary  will  be  deleted.  In  such  a  fitted  model,  the  intercept  represents  the  overall  January 
mean  and  the  coefficients  for  other  months  are  the  deviations  of  their  means  from  the 
January  mean. 

summary (mode 12 ) 

A  summary  of  the  fitted  regression  model  is  printed  out  with  this  command.  Many  vari¬ 
ables  derived  from  the  fitted  model  can  also  be  easily  obtained.  For  example,  the  fitted 
values  can  be  printed  as 

fitted (mode 12 ) 

whereas  residuals  are  obtained  by  using 

residuals (model2) 

#  Exhibit  3.4  on  page  33. 

model3=lm (tempdub-month . )  #  intercept  is  automatically 
included  so  one  month  (January)  is  dropped 
summary (model 3 ) 

#  Exhibit  3.5  on  page  35. 
har . =harmonic (tempdub, 1) 

The  first  pair  of  harmonic  functions  (sine  and  cosine  pairs)  can  be  constructed  by  the 
harmonic  function,  which  takes  a  time  series  as  its  first  argument  and  the  number  of 
harmonic  pairs  as  its  second  argument.  Run  ?harmonic  to  learn  more  about  this  func¬ 
tion.  The  output  of  the  harmonic  function  is  a  matrix  that  is  saved  into  an  object  named 
har  . .  Again,  the  first  period  is  part  of  the  name  and  included  to  make  the  later  print¬ 
outs  clearer. 

model4=lm (tempdub-har . ) 
summary (mode 14 ) 

We  now  briefly  discuss  the  use  of  matrices  in  R.  A  matrix  is  a  rectangular  array  of  num¬ 
bers.  It  can  be  created  by  the  matrix  function.  Here  is  an  example: 
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>  M=matrix ( 1 : 6 , ncol=2 ) 

>  M 

[,1]  [ ,  2] 

[1,]  1  4 

[2,]  2  5 

[3,]  3  6 

The  matrix  function  expects  a  vector  as  its  first  argument,  and  it  uses  the  values  in  the 
supplied  vector  to  fill  up  a  matrix  column  by  column.  The  column  dimension  of  a  matrix 
is  specified  by  the  ncol  argument  and  the  row  dimension  by  the  nrow  argument.  The 
expression  1 :  6  stands  for  the  vector  containing  the  integers  from  1  to  6.  So  the  matrix 
function  creates  a  matrix  consisting  of  two  columns  using  the  six  numbers  1,  2,  3,  4,  5, 
and  6.  Since  the  row  dimension  is  missing,  R  assumes  that  the  matrix  has  six  elements 
and  hence  the  missing  row  dimension  is  set  to  2.  The  dimensions  of  a  matrix  can  be 
extracted  using  the  dim  function. 

>  dim(M) 

[1]  3  2 

This  displays  the  row  and  column  dimensions  of  M  as  a  vector.  The  function  apply 
can  process  a  matrix  column  by  column,  with  each  column  operated  by  a  supplied  func¬ 
tion.  For  example,  the  column  means  of  M  can  be  computed  as  follows: 

>  apply (M, 2 , mean) 

[1]  2  5 

The  first  argument  of  the  apply  function  is  the  matrix  on  which  it  processes,  and  the 
second  argument  is  MARGIN,  which  should  be  set  to  1  for  row  processing  or  2  for  col¬ 
umn  processing.  The  third  argument  is  FUN,  which  takes  the  user-specified  function. 
The  example  above  instructs  R  to  process  M  column  by  column  and  apply  the  mean 
function  to  each  column.  How  would  you  modify  the  preceding  R  command  to  compute 
the  row  sums  of  M? 

#  Exhibit  3.6  on  page  35. 

plot (ts (fitted (mode 14 ) , f req=12 , start=c (1964,1)  )  , 
ylab= ' Temperature  1 , type= 1 1 ' , 
ylim=range (c (fitted (model4) , tempdub) ) ) 
points (tempdub) 

The  ylim  option  ensures  that  the  y-axis  has  a  range  that  includes  both  the  raw  data  and 
the  fitted  values. 

#  Exhibit  3.8  on  page  43. 

plot (y=rstudent (model3 ) , x=as .vector (time (tempdub) ) , 

xlab= ' Time ' ,  ylab= ' Standardized  Residuals type= ' o ' ) 

The  expression  rstudent  (model3 )  returns  the  (externally)  Studentized  residuals 
from  the  fitted  model.  To  compute  the  (internally)  standardized  residuals,  use  the  com¬ 
mand  rstandard (model3 ) . 

#  Exhibit  3.11  on  page  45. 

hist (rstudent (model3 ) , xlab= ' Standardized  Residuals ' ) 

The  function  hist  draws  a  histogram  of  the  data  passed  to  it  as  the  first  argument.  Note 
that  the  default  heading  of  the  histogram  says  that  the  plot  is  a  histogram  of 
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rstudent  (model3 ) .  While  the  default  main  label  correctly  depicts  what  is  plotted, 
it  is  often  desirable  to  have  a  less  technical  but  more  descriptive  label;  for  example,  set¬ 
ting  the  option  main=  '  Histogram  of  the  Standardized  Residuals'. 

#  Exhibit  3.12  on  page  45. 
qqnorm (rstudent (model3 )  ) 

The  expression  rstudent  (model3 )  extracts  the  standardized  residuals  of  model3. 
The  qqnorm  function  then  plots  the  Q-Q  normal  scores  plot  of  the  residuals.  A  refer¬ 
ence  straight  line  can  be  superimposed  on  the  Q-Q  normal  score  plot  by  running  the 
command  qqline  (rstudent  (model3)  ) . 

#  Exhibit  3.13  on  page  47. 
acf (rstudent (model3) ) 

The  acf  function  computes  the  sample  autocorrelation  function  of  the  time  series  sup¬ 
plied  to  the  function.  The  maximum  number  of  lags  is  determined  automatically  based 
on  the  sample  size.  It  can,  however,  be  changed  to,  say,  30  by  setting  the  option 
max .  lag=3  0  when  calling  the  function. 

The  Shapiro- Wilk  test  and  the  runs  test  on  the  residuals  can  be  carried  out  respec¬ 
tively  by  the  following  commands. 

shapiro . test (rstudent (model3 ) ) 
runs (rstudent (model3)  ) 

These  commands  compute  the  test  statistics  as  well  as  their  corresponding  /7-values. 
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#  Exhibit  4.2  on  page  59. 
data (mal . 2 . s) 

plot (mal . 2 . s , ylab=expression (Y  [t] )  , type= ’o') 

The  software  R  can  display  mathematical  symbols  in  a  graph.  The  option 
ylab=expression  ( Y  [t]  )  specifies  that  the  y  label  is  Y  with  t  as  its  subscript,  all  in 
math  font.  Typesetting  a  formula  does  require  some  additional  work.  Read  the  help 
pages  for  legend  (?legend)  and  run  the  command  demo  (mathplot)  to  learn 
more  about  this  topic. 

An  MA(1)  series  with  MA  coefficient  equal  to  9j  =  -0.9  and  of  length  n  =  100  can 
be  simulated  by  the  following  commands. 

set . seed (12345) 

This  command  initializes  the  seed  of  the  random  number  generator  so  that  a  simulation 
can  be  reproduced  if  needed.  Without  this  command,  the  random  generator  will  initial¬ 
ize  “randomly,”  and  there  is  no  way  to  reproduce  the  simulation.  The  argument  1234  5 
can  be  replaced  by  other  numbers  to  obtain  different  random  numbers. 

y=arima . sim (model  =  list (ma=-c ( -  0 . 9) )  ,n=100) 

The  arima  .  sim  function  simulates  a  time  series  from  a  given  ARIMA  model  passed 
into  the  function  as  a  list  that  contains  the  AR  and  MA  parameters  as  vectors.  The  simu¬ 
lated  model  above  is  an  MA(1)  model,  so  there  is  no  AR  part  in  the  model  list.  The  soft- 
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ware  R  uses  a  plus  convention  in  parameterizing  the  MA  part,  so  we  have  to  add  a  minus 
sign  before  the  vector  of  MA  values  to  agree  with  our  parameterization.  The  sample  size 
is  determined  by  the  value  of  the  argument  n.  So,  the  command  above  instructs  R  to 
simulate  a  realization  of  size  100  from  an  MA(1)  model  with  9|  =  -0.9. 

We  now  digress  to  explain  some  pertinent  facts  about  list.  A  list  is  the  most  flex¬ 
ible  data  structure  in  R.  You  may  think  of  a  list  as  a  cabinet  with  many  drawers  (ele¬ 
ments  or  components),  each  of  which  contains  data  with  possibly  different  data 
structures.  For  example,  an  element  of  a  list  can  be  another  list!  The  elements  of  a  list 
are  ordered  according  to  the  order  they  are  entered.  Also,  elements  can  be  named  to 
facilitate  their  easy  retrieval.  A  list  can  be  created  by  the  list  function  with  elements 
supplied  as  its  arguments.  The  elements  may  be  passed  into  the  list  function  in  the 
form  of  name  =  value,  delimited  by  commas.  Below  is  an  example  of  a  list  contain¬ 
ing  three  elements  named  a,  b,  and  c,  where  a  is  a  three-dimensional  vector,  b  is  a 
number,  and  c  is  a  time  series. 

>  listl  =  list (a=c (1, 2 , 3) , b=4 ,c=ts(c(5,6,7,8)  , 

start=c (2006,2) , f requency=4 ) ) 

>  listl 
$a 

[1]  123 
$b 

[1]  4 

$c  Qtrl  Qtr2  Qtr3  Qtr4 

2006  567 

2007  8 

To  retrieve  an  element  of  a  list,  run  the  command  listname$elementname,  for 
example 

>  listl$c 

Qtrl  Qtr2  Qtr3  Qtr4 

2006  567 

2007  8 

Data  of  irregular  structure  can  be  stored  as  a  list.  The  output  of  a  function  is  often  a  list. 
Simply  entering  the  name  of  a  list  may  result  in  dazzling  output  if  the  printed  list  is 
large.  An  alternative  is  to  first  explore  the  structure  of  a  list  by  the  function  str  (str 
stands  for  structure).  An  example  follows. 

>  str(listl) 

List  of  3 

$  a:  num  [1:3]  123 

$  b :  num  4 

$  c:  Time-Series  [1:4]  from  2006  to  2007:  5678 
This  shows  that  listl  has  three  elements  and  describes  these  elements  briefly. 
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#  Exhibit  5.4  on  page  91. 

plot (diff (log (oil .price) ) ,ylab= ' Change  in  Log (Price)', 
type= ' 1 ' ) 
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The  function  dif  f  outputs  the  first  difference  of  the  supplied  time  series.  Higher-order 
differences  can  be  computed  by  supplying  the  differences  argument.  For  example, 
the  second  difference  of  log  (oil. price)  can  be  computed  by  the  command 

diff (log (oil .price) ,  dif f erences=2 ) 

A  useful  convention  of  R  is  that  the  name  of  an  argument  in  a  function  can  be  abbrevi¬ 
ated  if  it  does  not  result  in  ambiguity.  For  example,  the  previous  command  can  be  short¬ 
ened  to 

diff (log (oil . price) , dif f =2 ) 

Note  that  the  second  argument  of  the  diff  function  is  the  lag  argument.  By  default, 
lag=l  and  the  diff  function  computes  regular  differences — first  or  higher  differ¬ 
ences.  Later,  when  we  deal  with  seasonal  time  series  data,  it  will  sometimes  be  desirable 
to  consider  seasonal  differences.  For  example,  we  may  want  to  subtract  this  month’s 
number  from  the  number  of  the  same  month  one  year  ago;  that  is,  the  differences  are 
computed  with  a  lag  of  12  months.  This  can  be  done  by  specifying  lag=12.  As  an  illus¬ 
tration,  computing  the  seasonal  differences  of  period  12  can  be  done  by  issuing  the  com¬ 
mand  diff  ( temp  dub  ,  lag  =  12  )  .  What  will  be  computed  by  the  command 
diff  (log  (oil  .price)  ,  2)  ?  One  of  the  authors  (KSC)  committed  a  serious  error, 
more  than  once,  when  he  tried  to  compute  the  second  regular  differences  of  some  time 
series  by  running  a  similar  command  with  unnamed  arguments.  Instead  of  the  second 
regular  differences,  the  first  seasonal  differences  of  lag  2  were  actually  computed  by  the 
command  with  unnamed  arguments!  Imagine  his  frustrations  of  many  anxious  hours,  all 
because  the  data  analysis  from  the  flawed  computations  seriously  conflicted  with  expec¬ 
tations  based  on  theory!  The  moral  is  that  passing  unnamed  arguments  to  a  function  is 
risky  unless  you  know  the  positions  of  the  relevant  arguments  very  well.  It  is  well  to 
remember  that  unnamed  arguments,  if  present,  should  appear  together  in  the  beginning 
part  of  the  argument  list,  and  there  should  be  no  unnamed  argument  after  a  named  one. 
Indeed,  mixed  arguments  (some  named  and  some  unnamed  in  a  haphazard  order)  may 
result  in  erroneous  interpretation  by  R.  The  order  of  the  arguments  in  a  function  can  be 
quickly  checked  by  running  the  command  args  (  f  unct  ion  .  name  )  or 
?  function .  name,  where  function .  name  should  be  replaced  by  the  name  of  the 
function  you  are  checking. 

#  Exhibit  5.11  on  page  102. 
library (MASS) 

This  loads  the  library  MASS.  Run  the  command  library  (help=MASS)  to  see  the 
content  of  this  library. 

boxcox (lm (electricity-1) ) 

The  function  boxcox  computes  the  maximum  likelihood  estimate  of  the  power  trans¬ 
formation  on  the  response  variable  to  make  a  linear  regression  model  appropriate  for  the 
data.  The  first  argument  is  a  fitted  model  by  the  lm  function.  By  default,  the  boxcox 
function  produces  a  plot  of  the  log-likelihood  function  of  the  power  parameter.  The 
MLE  of  the  power  parameter  is  the  value  that  maximizes  the  plotted  likelihood  curve. 
Here  the  model  is  that  some  power  transform  of  electricity  is  given  by  the  model  of  a 
constant  mean  plus  normally  distributed  white  noise.  But  we  already  know  that  elec- 
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tricity  is  serially  correlated,  so  this  method  is  not  entirely  correct,  as  the  autocorre¬ 
lation  in  the  series  is  not  accounted  for. 

For  time  series  analysis,  a  more  appropriate  model  is  that  some  power  transform  of 
the  time  series  variable  follows  an  AR  model.  The  function  BoxCox .  ar  implements 
this  approach.  It  has  two  drawbacks  in  that  it  is  much  more  computer-intensive  and  that 
other  covariates  cannot  be  included  in  the  model  in  the  current  version  of  the  function. 
The  first  argument  of  BoxCox .  ar  is  the  name  of  the  time  series  variable.  The  AR 
order  may  be  supplied  by  the  user  through  the  order  argument.  If  the  AR  order  is 
missing,  the  function  estimates  the  AR  order  by  minimizing  the  AIC  for  the  log-trans¬ 
formed  data.  Both  boxcox  and  BoxCox .  ar  require  the  response  variable  to  be  posi¬ 
tive. 

BoxCox .ar (electricity) 

This  plots  the  log-likelihood  function  of  the  power  parameter  for  the  model  that 
accounts  for  autocorrelation  in  the  data. 
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#  Exhibit  6.9  on  page  120. 

acf (ma2 . s , ci . type= 1 ma ' , xaxp=c (0,20,10) ) 

The  argument  ci  .  type=  '  ma  1  instructs  R  to  plot  the  sample  ACF  with  the  confidence 
band  for  the  Ath  lag  ACF  computed  based  on  the  assumption  of  an  MA(I  -  1)  model. 
See  Equation  (6.1.11)  on  page  1 12  for  details. 

#  Exhibit  6.11  on  page  121. 
pacf (arl . s , xaxp=c (0,20,10)) 

This  calculates  and  plots  the  sample  PACF  function.  Run  the  command  ?par  to  learn 
more  about  the  xaxp  argument. 

#  Exhibit  6.17  on  page  124. 
eacf (armall . s) 

This  computes  the  sample  EACF  function  (extended  autocorrelation  function)  of  the 
data  armall .  s.  The  maximum  AR  and  MA  orders  can  be  set  via  the  ar  .  max  and 
ma  .  max  arguments.  Their  default  values  are  seven  and  thirteen,  respectively.  For  exam¬ 
ple,  eacf  (armall .  s ,  ar . max=10  ,  ma  .  max=10 )  computes  the  EACF  with  maxi¬ 
mum  AR  and  MA  orders  of  10.  The  EACF  function  prints  a  table  of  symbols  with  X 
standing  for  a  significant  value  and  O  a  nonsignificant  value. 

library (uroot) 

This  loads  the  uroot  library  and  the  following  commands  illustrate  the  computation  of 
the  Dickey-Fuller  unit-root  test. 

ar (dif f (rwalk) ) 

This  command  finds  the  AR  order  for  the  differenced  series,  which  is  order  8,  by  the 
minimum  AIC  criterion. 
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ADF .test (rwalk, selectlags  =  list (mode=c (1,2, 3, 4, 5, 6, 7, 8)  , 
Pmax=8) , itsd=c (1,0,0)  ) 

This  computes  the  ADF  test  for  the  data  rwalk.  The  selectlags  argument  takes  a 
list  as  its  value.  The  mode  argument  specifies  which  lags  must  be  included,  and  if  it  is 
absent,  then  the  Pmax  argument  sets  the  maximum  lag  and  the  ADF  .test  function 
determines  which  lags  to  include  in  the  test  using  several  methods  by  setting  the  mode 
to  signf ,  aic,  or  bic.  The  option  signf  is  the  default  value  for  mode,  which  esti¬ 
mates  a  subset  AR  model  by  retaining  only  significant  lags.  The  argument  itsd 
expects  a  vector;  the  first  two  elements  are  binary,  indicating  whether  to  include  a  con¬ 
stant  term  (if  the  first  element  is  1)  or  a  linear  time  trend  (if  the  second  element  is  1); 
and  the  third  element  zero  if  there  are  no  more  covariates  to  include  in  the  model.  See 
the  help  pages  for  the  ADF .  test  function  to  learn  more  about  it.  Hence,  the  R  com¬ 
mand  instructs  ADF  .  test  to  carry  out  the  test  with  the  null  hypothesis  that  the  model 
has  a  unit  root  and  an  intercept  term.  The  alternative  is  that  the  model  is  stationary,  so  a 
small  /?- value  implies  stationarity ! 

ADF. test (rwalk, selectlags=list (Pmax=0) , itsd=c (1,0,0) ) 

In  comparison,  the  preceding  command  carries  out  the  ADF  test  with  the  null  hypothe¬ 
sis  being  that  the  model  has  a  unit  root,  an  intercept  but  no  other  lags,  whereas  the  alter¬ 
native  specifies  that  the  model  is  a  stationary  AR(1)  model  with  an  intercept.  If 
itsd=c  (0,0,0),  then  the  alternative  model  is  a  centered  stationary  AR(1)  model, 
that  is,  with  zero  mean.  Such  a  hypothesis  is  not  relevant  unless  the  data  are  already 
mean-corrected. 

#  Exhibit  6.22  on  page  132. 
set . seed (92397) 

test=arima . sim (model=list (ar=c (rep ( 0 , 11) , .8) , 
ma=c (rep (0,ll),0.7)),n=120) 

This  simulates  a  subset  ARMA  model.  Here  rep  ( 0 , 11)  stands  for  a  sequence  of  1 1 
zeros. 

res=armasubsets (y=test , nar=14 , nma=14 , y . name= 'test 1 , 
ar.method= ' ols ' ) 

The  armasubsets  function  computes  various  subset  ARMA  models,  with  the  maxi¬ 
mum  AR  and  MA  orders  specified  by  the  nar  and  nma  arguments,  both  set  as  14  in  the 
example  above.  The  associated  AR  models  are  estimated  by  the  default  method  of  ols 
(ordinary  least  squares). 

plot (res) 

The  plot  function  is  a  smart  function.  Seeing  that  res  is  the  output  from  the 
armasubsets  function,  it  draws  a  table  indicating  several  of  the  best  subset  ARMA 
models. 
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Below  is  a  function  that  computes  the  method-of-moments  estimator  of  the  MA(1)  coef¬ 
ficient  of  an  MA(1)  model.  It  is  a  simple  example  of  an  R  function.  Simply  copy  and 
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paste  it  into  the  R  console.  Press  the  enter  key  to  compile  the  code,  and  the  function 
estimate  .  mal .  mom  will  be  created  and  then  be  available  for  use  in  your  workspace. 
This  function  only  exists  in  the  particular  workspace  where  it  was  created. 

estimate . mal . mom= function (x) { r=acf (x, plot=F) $acf [1] ; 
if  (abs(r)<0.5)  return ((- 1+sqrt ( l-4*r^2 ))/ (2 *r) ) 
else  return (NA) } 

Readers  uninterested  in  the  specifics  of  R  programming  may  skip  down  to  the 
material  on  Exhibit  7.1.  The  syntax  of  an  R  function  takes  the  form 

function . name  =  function (argument  list) { function  body} 
where  function  body  is  a  set  of  R  statements  (commands).  Normally,  complete  R 
commands  are  separated  by  line  breaks.  Alternatively,  they  may  be  separated  by  the 
semicolon  symbol  (; ).  If  an  R  command  is  incomplete,  R  will  assume  that  it  is  to  be 
continued  on  the  next  line  and  so  forth  until  R  reads  a  complete  command.  So  the  func¬ 
tion  above  has  a  single  argument  called  x  and  contains  two  commands.  The  first  one  is 

r=acf (x,plot=F) $acf [1] 

which  instructs  R  to  compute  the  acf  of  x  without  plotting  the  values,  extract  the  first 
element  of  the  computed  sample  acf  function  (that  is,  the  lag  1  autocorrelation)  and  then 
save  it  in  an  object  called  r.  The  object  r  is  a  local  object;  it  only  exists  within  the 
estimate  .  mal .  mom  function  environment.  The  second  command  is 

if  (abs(r)<0.5) 

return ( (-1+sqrt (l-4*r^2) )/ (2*r) )  else  return(NA) 

Note  the  line  break  after  the  if  clause  and  the  second  half  of  the  command.  Since  the 
if  clause  alone  is  incomplete,  R  assumes  that  it  is  to  be  continued  on  the  next  line.  With 
the  second  line,  R  finds  a  complete  R  command  and  so  concludes  the  two  lines  of  com¬ 
mands  together  as  a  complete  command.  In  other  words,  R  sees  the  next  command  as 
equivalent  to  the  following  one  line: 

if  (abs(r)<0.5)  return (( -1+sqrt ( 1 -4*r*2 ))/ (2*r) )  else  return(NA) 
The  function  abs  computes  the  absolute  value  of  the  argument  passed  to  it,  whereas 
sqrt  is  the  function  that  computes  the  square  root  of  its  argument.  Now,  we  are  ready 
to  interpret  the  second  command:  if  the  absolute  value  of  r,  the  lag  1  autocorrelation  of 
x,  is  less  than  0.5  in  magnitude,  the  function  returns  the  number 

(-1  +  sqrt(  1  -  4*rA2))/(2*r) 

which  is  the  method-of-moments  estimator  of  the  MA(1)  coefficient  0j  ;  otherwise  the 
function  returns  NA  (see  Equation  (7.1.4)  on  page  150).  The  symbol  NA  is  the  code 
standing  for  a  missing  value  in  R.  (NA  stands  for  not  available.)  In  this  example,  R  is 
specifically  instructed  what  value  to  return  to  the  user.  However,  the  default  procedure  is 
that  a  function  returns  the  value  created  by  the  last  command  in  the  function  body.  R 
provides  a  powerful  computer  language  for  doing  statistics.  Please  consult  the  docu¬ 
ments  on  the  R  Website  to  learn  more  about  R  programming. 

#  Exhibit  7.1  on  page  152. 
data (mal . 2 . s) 

This  loads  a  simulated  MA(1)  series. 
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estimate . mal .mom (mal . 2 . s) 

This  computes  the  MA(1)  coefficient  estimate  by  the  method  of  moments  using  the 
user-created  estime  .  mal .  mom  function  above! 

data (arl . s) 

This  loads  a  simulated  AR(  1)  series  from  the  TSA  package, 
ar (arl . s , order . max=l , AIC=F , method= 1 yw 1 ) 

This  computes  the  AR  coefficient  estimates  for  the  arl .  s  series.  The  ar  function  esti¬ 
mates  the  AR  model  for  the  centered  data  (that  is,  mean-corrected  data),  so  the  intercept 
must  be  zero  and  not  estimated  or  printed  out  in  the  output.  The  ar  function  requires  the 
user  to  specify  the  maximum  AR  order  through  the  order .  max  argument.  The  AR 
order  may  be  estimated  by  choosing  the  order,  between  0  and  the  maximum  order, 
whose  model  has  the  smallest  AIC.  This  option  can  be  specified  by  setting  the  A1C 
argument  to  take  the  true  value,  that  is,  AIC=T.  Or  we  can  switch  off  order  selection  by 
specifying  AIC=F.  In  the  latter  case,  the  AR  order  is  set  to  the  maximum  AR  order.  The 
ar  function  can  estimate  the  AR  model  using  a  number  of  methods,  including  solving 
the  Yule-Walker  equations,  ordinary  least  squares,  and  maximum  likelihood  estimation 
(assuming  normally  distributed  white  noise  error  terms).  These  correspond  to  setting  the 
option  method=  '  yw 1 ,  method=  '  ols  1 ,  or  method=  '  mle  ' ,  respectively.  In  par¬ 
ticular,  the  preceding  R  command  fits  an  AR(1)  model  for  the  arl .  s  series  by  solving 
the  Yule-Walker  equation. 

We  digress  briefly  to  discuss  the  concept  of  a  logical  variable,  which  can  take  the 
value  TRUE  or  FALSE.  These  values  can  be  abbreviated  as  T  and  F.  In  binary  represen¬ 
tation,  T  is  also  represented  by  1  and  F  by  0.  R  adopts  the  useful  convention  that  a  logi¬ 
cal  variable  appearing  in  an  arithmetic  expression  will  be  automatically  converted  to  1  if 
it  is  a  T  and  0  otherwise. 

#  Exhibit  7.6,  page  165. 
data (armall . s) 

arima (armall . s ,  order =c (1,0,1) , method= ' CSS ' ) 

The  arima  function  estimates  an  ARIMA(p, d,q)  model  for  the  time  series  passed  to  it 
as  the  first  argument.  The  ARIMA  order  is  specified  by  the  order  argument, 
order=c  (p,  d,  q) ,  so  the  command  above  fits  an  ARMA(l.l)  model  to  the  data. 
Estimation  can  be  carried  out  by  the  conditional  sum-of-squares  method  (method= 

'  CSS  '  )  or  maximum  likelihood  (method=  '  ML  '  )  •  The  default  estimation  method  is 
maximum  likelihood,  with  initial  values  determined  by  the  CSS  method.  The  arima 
function  prints  out  a  summary  of  the  fitted  model.  The  fitted  model  may  also  be  saved  as 
an  object  that  can  be  further  manipulated,  for  example,  for  model  diagnostics.  By 
default,  if  d  =  0,  a  stationary  ARMA  model  will  be  fitted.  Also,  the  fitted  model  is  in  the 
centered  form;  that  is,  an  ARMA  model  fitted  to  the  series  minus  its  sample  mean.  The 
intercept  term  reported  in  the  output  of  the  arima  function  is  a  misnomer,  as  it  is  in  fact 
the  mean!  However,  the  mean  so  estimated  generally  differs  slightly  from  the  sample 


mean. 
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#  Exhibit  7.10  on  page  168. 
res=arima (sqrt (hare) , order =c (3,0,0)  ) 

This  saves  the  fitted  AR(3)  model  in  the  object  named  res.  The  output  of  the  arima 
function  is  a  list.  Run  the  command  str  (res)  to  find  out  what  is  saved  in  res. 
You  will  find  that  most  of  the  things  in  res  are  not  directly  useful.  Instead,  the  output  of 
the  arima  function  has  to  be  processed  by  other  functions  for  more  informed  summa¬ 
ries.  For  example,  (raw)  residuals  from  the  fitted  model  can  be  computed  by  the 
residuals  function  via  the  command  residuals  (res)  .  Fitted  values  can  be 
obtained  by  running  fitted  (res) .  Other  useful  functions  for  processing  a  fitted 
ARIMA  model  from  the  arima  function  will  be  discussed  below. 

The  empirical  approach  of  using  the  bootstrap  to  do  inference  is  illustrated  below. 

set . seed (12345) 

This  initializes  the  seed  of  the  random  number  generator  so  that  the  simulation  study 
can  be  repeated. 

coef m . cond . norm=arima . boot (res , cond . boot=T, is . normal =T, 
B=1000 , init=sqrt (hare) ) 

The  arima  .  boot  function  carries  out  a  bootstrap  analysis  based  on  a  fitted  ARIMA 
model.  Its  first  argument  is  a  fitted  ARIMA  model,  that  is,  the  output  from  the  arima 
function.  Four  different  bootstrap  methods  are  available:  The  bootstrap  series  can  be  ini¬ 
tialized  by  a  supplied  value  (cond .  boot=T)  or  not  (cond .  boot  =  F),  and  a  nonpara- 
metric  bootstrap  (is  .normal  =  F)  or  a  parametric  bootstrap  assuming  normal 
innovations  (is  .  normal=T)  can  be  used.  For  a  conditional  bootstrap,  the  initial  val¬ 
ues  can  be  supplied  as  a  vector  (the  arima  .  boot  function  will  use  the  initial  values 
from  the  supplied  vector).  The  bootstrap  sample  size,  say  1000,  is  specified  by  the 
B=1000  option.  The  function  arima  .boot  outputs  a  matrix  with  each  row  being  the 
bootstrap  estimate  of  the  ARIMA  coefficients  obtained  by  maximum  likelihood  estima¬ 
tion  with  the  bootstrap  data.  So,  if  B=1000  and  the  model  is  an  AR(3),  then  the  output 
is  a  1000  by  4  matrix  where  each  row  consists  of  the  bootstrap  AR(1),  AR(2),  and 
AR(3)  coefficients  plus  the  mean  estimate  in  that  order  ( (j)  j ,  (j)2,  <j)3,  p  ). 

signif (apply (coefm. cond. norm, 2 , function (x) 

{quantile (x, c ( . 025 , .975) , na . rm=T) } ) , 3 ) 

This  is  a  compound  R  statement.  It  is  equivalent  to  the  two  commands 

temp=apply (coefm. cond. norm, 2 , function (x) 

{quantile  (x, c ( . 025 , .975) , na . rm=T) } ) 
signif (temp, 3 ) 

except  that  the  temporary  variable  temp  is  not  created  in  the  original  compound  state¬ 
ment.  Recall  that  the  apply  function  is  a  general-purpose  function  for  processing  a 
matrix.  Here  the  apply  function  processes  the  matrix  coefm .  cond .  norm  column 
by  column,  with  each  column  supplied  to  the  no-name  user-supplied  function 

function (x) {quantile (x, c ( . 025 , .975) , na . rm=T) } 

This  no-name  function  has  one  input,  called  x,  that  is  processed  by  the  quantile 
function.  The  quantile  function  takes  a  vector  and  computes  the  sample  quantiles 
with  the  corresponding  probability  specified  in  the  second  argument.  The  third  argu- 
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ment  of  the  quantile  function  is  specified  as  na  .  rm=T  (na  stands  for  not  available  and 
rm  means  remove),  which  means  that  any  missing  values  in  the  input  are  discarded 
before  computing  the  quantiles.  This  specification  is  pivotal  because  by  default  any 
quantile  of  a  dataset  with  some  missing  values  is  defined  to  be  a  missing  value  (NA)  in 
R.  (Some  bootstrap  series  may  have  convergence  problems  upon  fitting  an  ARIMA 
model  and  hence  the  output  of  the  bootstrap  function  may  contain  some  missing  values.) 
To  return  to  the  interpretation  of  the  command  on  the  right-hand  side  of  temp,  it 
instructs  R  to  compute  the  2.5th  and  97.5th  percentiles  of  each  bootstrap  coefficient  esti¬ 
mate.  To  enable  precise  calculations,  R  maintains  many  significant  digits  in  the  numbers 
stored  in  an  object.  The  printed  version,  however,  usually  requires  fewer  significant  dig¬ 
its  for  clarity.  This  can  be  done  by  the  signif  function.  The  signif  function  outputs 
the  object  passed  into  it  as  first  argument,  but  only  to  the  number  of  significant  digits 
specified  in  the  second  argument,  which  is  three  in  the  example.  Altogether,  the  com¬ 
pound  R  command  computes  the  95%  bootstrap  confidence  intervals  for  each  AR  coef¬ 
ficient. 
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#  Exhibit  8.2  on  page  177. 
data (hare) 

ml . hare=arima (sqrt (hare) , order=c (3,0,0)) 
ml . hare 

This  prints  the  fitted  AR(3)  model  for  the  square-root-transformed  hare  data.  The  AR(2) 
coefficient  estimate  ( (j)-,  )  turns  out  not  to  be  significant.  Note  that  the  AR(2)  coefficient 
is  the  second  element  in  the  coefficient  vector,  as  shown  in  the  printout  of  the  fitted 
model.  A  constrained  ARIMA  model  with  some  elements  fixed  at  certain  values  can  be 
fitted  by  using  the  fixed  argument  in  the  arima  function.  The  fixed  argument 
should  be  a  vector  of  the  same  length  as  the  coefficient  vector  and  its  elements  set  to  NA 
for  all  of  the  free  elements  but  set  to  zero  (or  another  fixed  value)  for  all  of  the  con¬ 
strained  coefficients.  For  example,  here  the  AR(2)  coefficient  is  constrained  to  be  zero 
(4>2  =  0 )  and  hence  f  ixed=c  (NA,  0  ,  NA,  NA) ,  that  is,  the  AR(1),  AR(3),  and  the 
“intercept”  term  are  free  parameters,  whereas  the  AR(2)  is  fixed  at  0.  Remember  that 
the  “intercept”  term  is  last.  Below  is  the  command  for  fitting  the  constrained  AR(3) 
model  for  the  hare  data. 

m2 . hare=arima (sqrt (hare) , order=c (3,0,0) , 
f ixed=C (NA, 0 , NA, NA) ) 
m2 . hare 

Note  that  the  intercept  term  is  actually  the  mean  in  the  centered  form  of  the  ARMA 
model;  that  is,  if  v  =  sqrt(hare)  -  intercept,  then  the  model  is 

yt  =  0.919yf_  j  - 0.5313yf_3  +  ef 

so  the  “true”  estimated  intercept  equals  5.6889*(1  -  0.919  +  0.5313)  =  3.483,  as  stated 
in  the  text! 
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plot (rstandard (m2 . hare) , 

ylab= ' Standardized  Residuals ' , type= ' b ' ) 

The  function  rstandard  computes  the  standardized  residuals;  that  is,  the  raw  residu¬ 
als  normalized  by  the  estimated  noise  standard  deviation. 

abline (h=0 ) 

adds  a  horizontal  line  to  the  plot  with  zero  y-intercept.  Use  the  help  in  R  to  find  out  how 
to  add  a  vertical  line  with  x-intercept  =  10. 

#  Exhibit  8.12  on  page  185  (prefaced  by  some  commands  in 
Exhibit  8.1  on  page  176) 
data (color) 

ml . color=arima (color, order=c (1,0,0)  ) 
tsdiag (ml . color , gof =15 , omit . initial =F) 

The  tsdiag  function  in  the  TSA  package  has  been  modified  from  that  in  the  stats 
package  of  R.  It  performs  model  diagnostics  on  a  fitted  model.  The  argument  gof  spec¬ 
ifies  the  maximum  number  of  lags  in  the  acf  function  used  in  the  model  diagnostics. 
Setting  the  argument  omit .  initial=T  omits  the  few  initial  residuals  from  the  anal¬ 
ysis.  This  option  is  especially  useful  for  checking  seasonal  models  where  the  initial 
residuals  are  close  to  zero  by  construction  and  including  them  may  skew  the  model 
diagnostics.  In  the  example,  the  omit .  initial  argument  is  set  to  be  F  so  that  the 
diagnostics  are  done  with  all  residuals.  Recall  that  the  Ljung-Box  (portmanteau)  test  sta¬ 
tistic  equals  the  weighted  sum  of  the  squared  residual  autocorrelations  from  lags  1  to  K, 
say;  see  Equation  (8.1.12)  on  page  184.  Assuming  that  the  ARIMA  orders  are  correctly 
specified,  the  validity  of  the  approximate  chi-square  distribution  for  the  Ljung-Box  test 
statistic  requires  that  K  be  larger  than  the  lag  beyond  which  the  original  time  series  has 
negligible  autocorrelation.  The  modified  tsdiag  function  in  the  TSA  package  checks 
this  requirement;  consequently  the  Ljung-Box  test  is  only  computed  for  sufficiently 
large  K.  If  the  required  K  is  larger  than  the  specified  maximum  lag,  tsdiag  will  return 
an  error  message.  This  problem  can  be  solved  by  increasing  the  maximum  lag  asked  for. 
Use  ? tsdiag  to  learn  more  about  the  modified  tsdiag  function. 
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#  Exhibit  9.2  on  page  205. 
data (tempdub) 

tempdubl=ts (c (tempdub, rep (NA, 24 ) ) , start=start (tempdub) , 
freq= frequency (tempdub) ) 

This  appends  two  years  of  missing  values  to  the  tempdub  data,  as  we  want  to  forecast 
the  temperature  for  two  years  into  the  future.  The  function  start  extracts  the  starting 
date  of  a  time  series.  The  function  frequency  extracts  the  frequency  of  the  time  series 
passed  to  it,  here  being  12.  Hence,  tempdubl  contains  the  Dubuque  temperature  series 
augmented  by  two  years  of  missing  data,  with  the  same  starting  date  and  frequency  of 
sampling  per  unit  time  interval. 

har . =harmonic (tempdub, 1) 

This  creates  the  first  pair  of  harmonic  functions. 
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m5 . tempdub=arima (tempdub, order=c (0,0,0) ,xreg=har. ) 

This  fits  the  harmonic  regression  model  using  the  arima  function.  The  covariates  are 
passed  to  the  function  through  the  xreg  argument.  In  the  example,  har .  is  the  covari¬ 
ate  and  the  arima  function  fits  a  linear  regression  model  of  the  response  variable  on  the 
covariate,  with  the  errors  assumed  to  follow  an  ARIMA  model.  Because  the  specified 
ARIMA  orders  p  =  d  =  q  =  0,  the  presumed  error  structure  is  white  noise;  that  is,  the 
arima  function  fits  an  ordinary  linear  regression  model  of  tempdub  on  the  first  pair 
of  harmonic  functions.  Note  that  the  result  is  the  same  as  that  from  the  fit  using  the  lm 
function,  which  can  be  verified  by  the  following  commands: 

har . =harmonic (tempdub, 1) ;  model4=lm (tempdub~har . ) 
summary (mode 14 ) 

The  xreg  argument  expects  the  covariate  input  either  as  a  matrix  or  a 
data  .  frame.  A  data  .  frame  can  be  thought  of  as  a  matrix  made  up  by  binding 
together  several  covariates  column  by  column.  It  can  be  created  by  the  data  .  frame 
function  with  multiple  arguments,  each  of  which  takes  the  form  covariate  .  name  = 
R  statement  for  computing  the  covariate.  If  the  covariate  .  name  is  omitted,  the 
R  statement  becomes  the  covariate  name,  which  may  be  undesirable  for  a  complex 
defining  statement.  If  the  R  statement  is  a  matrix,  its  columns  are  taken  as  covariates 
with  the  column  names  taken  as  the  covariate  names.  Consider  the  example  of  augment¬ 
ing  the  harmonic  regression  model  above  by  a  linear  time  trend.  The  augmented  model 
can  be  fitted  by  the  command 

arima (tempdub , order=c (0,0,0)  , 

xreg=data . frame (har . , trend=time (tempdub) ) ) 

m5 . tempdub 

This  prints  the  fitted  model. 

We  now  illustrate  prediction  with  an  example, 
newhar . =harmonic (ts (rep (1,24) ,  start=c (1976,1) , f req=12 ) , 1) 
This  creates  the  harmonic  functions  over  two  years  starting  from  January  1976.  Remem¬ 
ber  that  the  tempdub  series  ends  in  December  1975. 

plot (m5 . tempdub , n . ahead=24 , nl=c (1972,1) , newxreg=newhar . , 
col= ' red' ,  type= ' b' , ylab= 1  Temperature  1 , xlab= 1  Year  1 ) 

This  computes  and  plots  the  forecasts  based  on  the  fitted  model  passed  as  the  first  argu¬ 
ment.  Here,  we  specify  a  forecast  for  24  steps  ahead  through  the  argument 
n .  ahead=24.  The  covariate  values  over  the  period  of  forecast  have  to  be  supplied  by 
the  newxreg  argument.  The  newxreg  argument  should  match  the  xreg  argument  in 
terms  of  the  covariates  except  that  their  values  are  from  different  periods.  The  plot  may 
be  drawn  with  a  starting  date  different  from  the  start  date  of  the  time  series  data  by  using 
the  nl  argument.  Here,  nl  =  c  (1972 , 1)  specifies  January  1972  as  the  start  date  for 
the  plot.  For  nonseasonal  data  (that  is,  frequency  =  1),  nl  should  be  a  scalar.  The  col 
and  type  arguments  refer  to  the  color  and  style  of  the  plotted  lines. 

#  Exhibit  9.3  on  page  206. 
data (color) 

ml . color=arima (color , order=c (1,0,0)) 
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plot (ml . color , n . ahead=12 , col= ' red ' , type= ' b ' , xlab= ' Year  1 , 
ylab= ' Temperature  1 ) 
abline (h=coef (ml . color) 

[names (coef (ml . color) ) == 1  intercept ' ] ) 

The  final  command  adds  the  horizontal  line  at  the  estimated  mean  (intercept).  This  is  a 
complex  statement.  The  expression  coef  (ml .  color)  extracts  the  coefficient  vector. 
The  components  of  the  coefficient  vector  are  named.  The  names  of  a  vector  can  be 
extracted  by  the  names  function,  so  names  (coef  (ml .  color)  )  returns  the  vector  of 
names  of  the  components  of  the  coefficient  vector.  The  ==  operator  compares  the  two 
vectors  on  its  two  sides  element  by  element,  resulting  in  a  vector  consisting  of  TRUEs 
and  FALSEs  depending  on  whether  the  elements  are  equal  or  not.  (If  the  vectors  under 
comparison  are  of  unequal  length,  R  recycles  the  shorter  one  repeatedly  to  match  the 
longer  one.)  Hence,  the  command 

[names (coef (ml . color) ) ==  ' intercept ' ] 

returns  a  vector  with  the  TRUE  value  in  the  position  in  which  the  “intercept”  component 
lies  and  with  all  other  elements  FALSE.  Finally,  the  intercept  coefficient  estimate  is 
extracted  by  the  “bracket”  operation: 

coef (ml . color) [names (coef (ml . color) ) == ' intercept ' ] 

The  operation  within  brackets  subsets  a  vector  using  one  of  two  mechanisms.  Let  v  be  a 
vector.  A  subvector  of  it  can  be  formed  by  the  command  v  [  s  ] ,  where  s  is  a  Boolean 
vector,  (that  is,  consisting  of  TRUEs  and  FALSEs)  that  is  of  the  same  length  as  v.  The 
vector  v  [  s  ]  is  then  a  sub-vector  of  v  consisting  of  those  elements  of  v  for  which  the 
corresponding  element  in  s  is  TRUE;  elements  in  v  whose  corresponding  element  in  s 
is  FALSE  are  discarded  from  v  [  s  ] . 

A  second  way  to  subset  a  vector  is  to  construct  s  so  that  it  contains  the  position  of 
the  elements  to  be  retained  and  v  [  s  ]  will  return  the  desired  subvector.  A  variation  of 
this  approach  is  to  form  a  subvector  by  deletion.  Unwanted  elements  are  designated  by 
giving  their  positions  multiplied  by  - 1.  An  illustration  follows. 

>  v=l  :  5 

This  creates  a  vector  containing  the  first  five  positive  integers. 

>  v 

[1]  1  2  3  4  5 

>  names (v) 

NULL 

By  default,  the  components  of  v  are  unnamed,  so  names  (v)  returns  an  empty  vector 
denoted  by  the  object  NULL. 

>  names ( v)  =c  (  '  A '  , ' B ' , ' C ' , ' D ' , ' E ' ) 

This  is  the  method  of  assigning  names  to  the  components  of  a  vector. 

>  v 

ABODE 
1  2  3  4  5 

The  command 

>  names (v) == 1 C ' 
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[1]  FALSE  FALSE  TRUE  FALSE  FALSE 
finds  which  components  of  names  (v)  is  “C.” 

The  command 

>  v [names (v) ==' C ' ] 

C 

3 

subsets  v  by  Boolean  extraction. 

The  command 

>  v  [3] 

C 

3 

subsets  v  by  supplying  the  positions  of  the  retained  elements. 
The  command 

>  v  [-3] 

A  B  D  E 
12  4  5 

subsets  v  by  supplying  the  positions  of  the  unwanted  elements. 
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The  theoretical  ACF  of  a  stationary  ARMA  process  can  be  computed  by  the  ARMAacf 
function.  The  ar  parameter  vector,  if  present,  is  to  be  passed  into  the  function  via  the  ar 
argument.  Similarly,  the  ma  parameter  vector  is  passed  into  the  function  via  the  ma 
argument.  The  maximum  lag  may  be  specified  by  the  lag .  max  argument.  Setting  the 
pacf  argument  to  TRUE  computes  the  theoretical  pacf;  otherwise  the  function  com¬ 
putes  the  theoretical  acf.  Consider  as  an  example  the  seasonal  MA  model: 

Yf  =  (1  +0.5fi)(l  +0.8B12)er 

Note  that  (1  +  0.5B)(1  +  0.8B12)  =  (1  +  0.5 B  +  0.8B12  +  0.4B13)  so  the  ma  coefficients 
are  specified  by  the  option  ma=c  ( 0 . 5  ,  rep  (0,10)  ,  0 . 8 , 0 . 4 ) .  Its  theoretical  ACF 
is  displayed  on  the  left  side  of  Exhibit  10.3,  which  can  be  done  by  the  following  R  com¬ 
mands. 

plot (y=ARMAacf (ma=c (0.5, rep (0,10) ,0.8, 0.4) , 
lag.max=13) [-1] , x=l : 13 , type= ' h ' , 
xlab= ' Lag  k ' , ylab=expression (rho [k] ) , axes=F , ylim=c (0,0.6) ) 
points (y=ARMAacf (ma=c ( 0 . 5 , rep (0,10) ,0.8, 0.4) , 
lag.max=13) [ - 1] , x=l : 13 , pch=2 0 ) 
abline (h=0) 
axis ( 1 , at=l : 13 , 

labels=C (1,NA, 3 , NA, 5 , NA, 7 , NA, 9 , NA, 11 , NA, 13 ) ) 
axis (2 ) 

text (x=7 , y= . 5 , labels=expression (list (theta&=&-0 . 5 , 

Theta&=&-  0.8)  )  ) 

As  the  labeling  of  the  figure  requires  Greek  alphabets  and  subscripts,  the  label 
information  has  to  be  passed  via  the  expression  function.  Run  the  help  menu 
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Pplotmath  to  learn  more  about  how  to  do  mathematical  annotations  in  R. 

#  Exhibit  10.10  on  page  237 
ml . co2=arima (co2 , order =c  (0,1,1) , 

seasonal=list (order=c (0,1,1) ,period=12) ) 

The  argument  seasonal  supplies  the  information  on  the  seasonal  part  of  the  seasonal 
ARIMA  model.  It  expects  a  list  with  the  seasonal  order  supplied  in  the  component 
named  order  and  the  seasonal  period  entered  via  the  period  component,  so  the  com¬ 
mand  above  instructs  the  arima  function  to  fit  a  seasonal  ARIMA  (0,1,1)  x  (0,l,l)i2 
model  to  the  co2  series. 

ml . co2 

This  prints  a  summary  of  the  fitted  seasonal  ARIMA  model. 
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#  Exhibit  11.5  on  page  255. 

acf (as .vector (diff (diff (window (log (airmiles) , 
end=c (2001,8) ) ,12) ) ) , lag.max=48) 

The  expression  window  (log  (airmiles )  ,  end=c  (2001,8)  )  subsets  the 
log  (airmiles)  time  series  by  specifying  a  new  end  date  of  August  2001.  The  sub¬ 
time  series  is  first  seasonally  differenced  with  lag  12  and  then  regularly  differenced.  The 
doubly  differenced  series  is  then  passed  to  the  acf  function  for  computing  the  sample 
ACF  out  to  48  lags. 

#  Exhibit  11.6  on  page  255. 

air . ml=arimax (log (airmiles) , order =c (0,1,1) , seasonal = 
list ( order =c (0,1,1) ,period=12)  , 
xtransf =data . frame (1911  =  1* (seq (airmiles) =  =  69)  , 

1911  =  1* (seq (airmiles) =  =  6  9)  )  , 

transf er=list (c (0, 0)  ,  c (1, 0)  )  , 

xreg=data. frame (Dec96  =  l* (seq (airmiles) =  =  12)  , 

Jan97  =  l* (seq (airmiles) =  =  13)  , 

Dec02=l* (seq (airmiles) ==84) ) ,method= 'ML ' ) 

The  arimax  function  extends  the  arima  function  so  that  it  can  handle  intervention 
analysis  and  outliers  (both  AO  and  IO)  in  time  series.  It  is  assumed  that  the  intervention 
affects  the  mean  function  of  the  process,  with  the  deviation  from  the  unperturbed  mean 
function  modeled  as  the  sum  of  the  outputs  of  an  ARMA  filter  of  a  number  of  covari¬ 
ates;  the  deviation  is  known  as  the  transfer  function.  The  covariates  making  up  the  trans¬ 
fer  function  are  passed  to  the  arimax  function  via  the  xtransf  argument  in  the  form 
of  a  matrix  or  a  data  .  frame.  For  each  such  covariate,  its  contribution  to  the  transfer 
function  takes  the  form  of  a  dynamic  response  given  by 

+  ci +  •••  +  a  B9) 

- - - - - covariate , 

(1  -bxB-b2B2 - bpBP)  ' 

The  transfer  function  is  the  sum  of  the  dynamic  responses,  in  the  form  of  some  ARMA 
filter,  of  all  covariates  in  the  xtransf  argument.  The  ARMA  order  of  the  filter  is 
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denoted  by  the  vector  c(p,q).lfp  =  q  =  0  (that  is,  c  (p ,  q)  =  c  ( 0 , 0 ) ),  the  contribu¬ 
tion  of  the  covariate  is  of  the  form  a0covariatef .  If  c  (p ,  q)  =  c  ( 1 , 0 ) ,  the  output 
becomes 


— — - — — covariate  t  =  aQ(covariatef  +  b^covariatet  j  +  b^covariate  t  _2  +  "') 

The  ARMA  orders  for  the  dynamic  components  of  the  transfer  function  are  supplied  via 
the  transf  argument  as  a  list  containing  the  vectors  of  ARMA  orders  in  the  order 
of  the  covariates  defined  in  the  xtransf  argument.  Hence,  the  options: 

xtransf =data . frame (1911  =  1* (seq (airmiles) =  =  6  9)  , 

1911=1* (seq (airmiles) ==69) ) , 
transf er=li st (c ( 0 , 0)  , c  (1 , 0) ) 

instruct  the  arimax  function  to  create  two  identical  covariates  called  1911,  which  is 
an  indicator  variable,  say  Pt,  that  equals  1  in  September  2001  and  0  otherwise,  and  the 
transfer  function  is  the  sum  of  two  ARMA  filters  of  the  9/11  indicator  variable  of 
orders  c(0,0)  and  c(l,0)  respectively.  Hence  the  transfer  function  equals 


co, 

ro°P,+  (l-co  2B)P' 


This  is  equivalent  to  an  ARMA(1,1)  filter  of  the  form 

{(COq  +  (Oj)  -  (OqCO -,B} 

(1  -co25)  P‘ 


which  can  be  specified  by  the  following  options 

xtransf =dat a . frame (1911  =  1* (seq (airmiles) =  =  6  9) )  , 
transf er=list (c (1, 1) ) 

Additive  outliers  (AO)  in  a  time  series  can  be  incorporated  as  indicator  variables 
passed  to  the  xreg  argument.  For  example,  three  potential  AOs  are  included  in  the 
model  by  the  following  supplied  argument: 

xreg=data . frame (Dec96  =  l* (seq (airmiles) =  =  12 )  , 

Jan97  =  l* (seq (airmiles) =  =  13)  , 

Dec02=l* (seq (airmiles) ==84) ) 

Note  that  the  first  potential  outlier  occurs  in  December  1996.  The  corresponding  indica¬ 
tor  variable  is  labeled  as  Dec  9  6  and  is  computed  by  the  formula 
1*  (seq(airmiles)  =  =  12),  which  results  in  a  vector  that  equals  0  except  its  twelfth 
element,  which  equals  1,  and  the  vector  is  of  the  same  length  as  airmiles.  Some  spe¬ 
cifics  of  this  “simple”  command  follow.  The  function  seq  creates  a  vector  consisting  of 
the  first  n  positive  integers,  where  n  is  the  length  of  the  vector  passed  to  the  seq  func¬ 
tion.  The  expression  seq  (airmiles )  =  =  12  creates  a  vector  of  the  same  length  as 
airmiles,  and  its  elements  are  all  FALSE  except  that  the  twelfth  element  is  TRUE. 
Then  1*  (seq  (airmiles )  =  =  12 )  is  an  arithmetic  expression  for  which  R  automati¬ 
cally  converts  any  imbedded  Boolean  vector  (seq  (airmiles)  =  =  12)  to  a  binary 
vector.  Recall  that  the  TRUE  values  are  converted  to  Is  and  the  FALSE  values  to  Os. 
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Multiplying  by  1  does  not  alter  the  converted  binary  vector.  Indeed,  multiplication  is 
employed  to  trigger  the  conversion  from  the  Boolean  values  to  binary  values. 

For  this  example,  the  unperturbed  process  is  assumed  to  be  an  IMA(1,1)  process,  as 
is  evident  from  the  supplied  argument  order=c  ( 0 , 1 , 1 )  .  In  general,  a  seasonal 
ARIMA  unperturbed  process  is  specified  in  the  same  way  that  it  is  specified  for  the 
arima  function. 

air . ml 

This  prints  out  the  fitted  intervention  model,  as  displayed  below. 

>  air. ml 

Call :  arimax (x=log (airmiles) , order=c (0,1,1) , seasonal 

list ( order =c (0,1,1) , period=12 ) , xreg=data . frame (Dec 9 6= 

1* (seq (airmiles) ==12) , Jan97=l* ( seq (airmiles ) ==13) , 

Dec02=l* (seq (airmiles) ==84 ) ) , method= ' ML ' , 

xtransf =data . frame (1911=1* ( seq (airmiles ) ==69) ,1911=1* 

(seq (airmiles) ==69) ) , transf er=list (c(0,0),c(l,0))) 

Coefficients : 

mal  smal  Dec96  Jan97  Dec02  1911-MAO  I911.1-AR1  1911.1-MAO 

-0.3825  -0.6499  0.0989  -0.0690  0.0810  -0.0949  0.8139  -0.2715 

s.e.  0.0926  0.1189  0.0228  0.0218  0.0202  0.0462  0.0978  0.0439 

sigma^2  estimated  as  0.000672:  log  likelihood=219 . 99 ,  aic=-423.98 

Note  that  the  parameter  in  the  transfer-function  component  defined  by  the  first  instance 
of  the  indicator  variable  1911  is  labeled  as  1 911 -MAO;  that  is,  the  MA(0)  coefficient. 
The  transfer-function  components  defined  by  the  second  instance  of  the  indicator  vari¬ 
able  1911  are  labeled  as  1911 . 1-AR1  and  1911 . 1-MA0.  These  are  the  AR(1)  and 
MA(0)  coefficient  estimates. 

We  can  also  try  the  equivalent  parameterization  of  specifying  an  ARMA(1,1)  filter 
on  the  9/11  indicator  variable. 

>  air . mla=arimax ( log (airmiles)  ,  order=c (0,1,1)  , 

seasonal=list (order=c (0,1,1) ,period=12) , 
xtransf =dat a . frame (1911  =  1* (seq (airmiles) =  =  6  9) )  , 
transf er= list (c (1 , 1)  )  , 

xreg=data. frame (Dec96=l* (seq (airmiles) ==12) , 

Jan97  =  l* (seq (airmiles) =  =  13)  , 

Dec02=l* (seq (airmiles) ==84) ) ,method= 'ML ' ) 

>  air.mla 

Call :  arimax (x=log (airmiles) , order=c (0,1,1) , seasonal= 

list ( order =c (0,1,1) , period=12 ) , xreg=data . frame (Dec 9 6=1 
* (seq (airmiles) ==12) , Jan97=l* (seq (airmiles) ==13) ,Dec02= 

1* (seq (airmiles) =  =  84 ) ) , method= 1  ML  1 , xtransf  = 
data . frame (1911=1* (seq (airmiles) ==69) ) , transfer= 
list (c (1 , 1) ) ) 

Coefficients : 

mal  smal  Dec96  Jan97  Dec02  I911-AR1  1911-MAO  I911-MA1 

-0.3601  -0.6130  0.0949  -0.0840  0.0802  0.8094  -0.3660  0.0741 

s.e.  0.0926  0.1261  0.0222  0.0229  0.0194  0.0924  0.0233  0.0424 

sigma^2  estimated  as  0.000648:  log  likelihood=221 . 76 ,  aic=-427.52 
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Note  that  the  parameter  estimates  of  this  model  are  similar  to  those  of  the  previous 
model  but  this  model  has  a  better  fit,  which  may  happen  as  the  optimization  is  done 
numerically. 

#  Exhibit  11.8  on  page  256. 

Ninellp=l* (seq (airmiles) ==69) 

This  defines  the  9/11  indicator  variable. 

pi ot (ts (Nine lip *(-0.0949)+  filter (Nine lip, filter=.8139, 
method= 1  recursive ' ,side  =  l) * (-0.2715)  , 

f requency=12 , start=1996) , type= 1 h 1 , ylab= '9/11  Effects') 
The  command 

Nine lip* (-0.0949) +filter (Nine lip ,filter=.8139, 
method= 1  recursive ' ,side  =  l) * (-0.2715) 

computes  the  estimated  transfer  function.  Note  that  the  command 
filter (Nine lip , f ilter= . 8139, method= ' recursive ' , side=l ) 
computes  (1-0 . 8139*B)  Ninellp.  The  function  filter  performs  an  MA  or  AR 
filtering  on  the  input  sequence  passed  to  it  as  the  first  argument.  Suppose  the  input  is  a 
vector*  =  c(*1,X2,...,x„).  Then  the  output  y  =  c(yj,y2,...,y„ )  defined  by  the  MA  filter 

yt  =  c0xt  +  clxt_l  +  --+cqxt_q 

can  be  computed  by  the  command 

filter(x,filter=c(cO,cl, . . . ,cq) ,side=l) . 

The  argument  side  =  l  specifies  that  the  MA  operator  works  on  current  and  past  values 
when  computing  an  output  value.  To  compute  yl,  the  value  of  xO  is  needed.  Since  the 
latter  is  not  observed,  the  filter  sets  it  to  NA,  and  hence  yl  is  also  NA.  In  this  case,  y2, 
y3,  and  so  forth  can  be  computed.  For  an  AR  filtering  with  the  output  defined  recur¬ 
sively  by  the  equation 

yt  =  xt  +  c\y,-i  +  ■■■  +  cPyt-P 

the  R  command  is 

filter (x, f ilter=c (cl , c2 ,  . . . , cp)  , method= ’ recursive  1 , 
side=l) 

Note  that,  unlike  the  case  of  the  MA  filter,  the  filter  vector  starts  with  cl  and  there  is  no 
cO  in  the  equation.  The  argument  method=  1  recursive  1  signifies  an  AR  type  of  fil¬ 
tering.  For  the  AR  filter,  the  initial  values  cannot  be  set  to  NA,  lest  all  output  values  be 
NA!  The  default  initial  values  are  zeros  although  other  initial  values  may  be  specified  via 
the  init  argument. 

abline (h=0) 

adds  a  horizontal  line  with  zero  y-intercept. 

#  Exhibit  11.9  on  page  259. 
set . seed (12345) 

y=arima . sim (model =list (ar= . 8 , ma= .5) , n . start=158 , n=100 ) 
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This  simulates  an  ARM A(  1,1)  series  of  sample  size  100.  To  remove  transient  effects  of 
the  initial  values,  a  burn-in  of  size  158  is  specified.  A  large  burn-in  of  the  order  of  hun¬ 
dreds  should  generally  ensure  that  the  simulated  process  is  approximately  stationary. 
The  number  158  is  chosen  for  no  particular  good  reason. 

y  [io] 

This  prints  out  the  tenth  simulated  value, 
y  [10] =10 

This  alters  the  tenth  value  to  be  10;  that  is,  it  becomes  an  additive  outlier,  mimicking  the 
effect  of  a  clerical  recording  mistake,  for  example ! 

y=ts (y, freq=l, start=l) ;  plot (y, type=' o' ) 
acf (y) 
pacf (y) 
eacf (y) 

This  exploratory  analysis  suggests  an  AR(1)  model. 
ml=arima (y, order=c (1 , 0 , 0 ) ) ;  ml;  detectAO (ml ) 

This  detects  the  presence  of  any  additive  outliers  (AO)  in  the  fitted  AR(1)  model.  The 
test  requires  an  estimate  of  the  standard  deviation  of  the  error  (innovation)  term,  which 
by  default  is  estimated  by  a  robust  estimation  scheme,  resulting  in  a  more  powerful  test. 
The  robust  estimation  scheme  can  be  switched  off  by  the  argument  robust  =  F,  as  illus¬ 
trated  in  the  command  below. 

detectAO (ml,  robust=F) 

This  verifies  that  a  nonrobust  procedure  is  less  powerful. 
detectIO (ml) 

This  detects  the  presence  of  any  innovative  outliers  (IO)  in  the  fitted  AR(1)  model.  As 
an  AO  is  found  in  the  tenth  case,  it  is  incorporated  as  an  indicator  covariate  in  the  fol¬ 
lowing  model. 

m2=arima (y, order =c (1,0,0) , xreg=data . frame (AO=seq (y) ==10) ) 
m2 

#  Exhibit  11.10  on  page  260 
data (co2 ) 

ml . co2=arima (co2 , order =c (0,1,1) , seasonal =list 
(order=c (0,1,1) ,period=12)  ) 
ml . co2 

detectAO (ml . co2 ) 
detectIO (ml . co2 ) 

As  an  IO  is  found  in  the  57th  data  case,  it  is  incorporated  in  the  model. 

m4  . co2=arimax (co2 , order =c (0,1,1)  , 

seasonal  =  list (order=c (0,1,1) ,period=12)  ,  io=c (57)  ) 

The  epochs  of  IOs  are  passed  to  the  arimax  function  via  the  io  argument,  which 
expects  a  list  containing  the  positions  of  the  IOs  either  as  the  time  index  of  the  IO  or 
as  a  vector  in  the  form  of  c  (year ,  month)  that  gives  the  year  and  month  of  the  IO  for 
seasonal  data;  the  latter  format  also  works  similarly  for  seasonal  data  of  other  types.  For 
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a  single  10,  it  is  not  necessary  to  enclose  the  single  vector  of  index  in  a  list  before  pass¬ 
ing  it  to  the  io  argument. 

#  Exhibit  11.11  on  page  262. 
set . seed (12345) 

X=rnorm (105) 

Y=zlag(X,2) +. 5*rnorm ( 105 ) 

The  command  zlag  (X,  2 )  computes  the  second  lag  of  X. 

X=ts (X  [  - (1:5) ]  , start  =  1, freq=l) 

This  omits  the  first  five  values  of  X  and  converts  the  remaining  values  to  form  a  time 
series. 

Y=ts ( Y [ - (1:5) ] , start=l , freq=l) 
ccf (X, Y,ylab= ' CCF 1 ) 

This  computes  the  cross-correlation  function  of  X  and  Y.  The  ylab  argument  is  sup¬ 
plied  in  lieu  of  the  default  v-label  of  the  ccf  function  that  is  “ACF”. 

#  Exhibit  11.14  on  page  264. 
data (milk) 

data (electricity) 

milk . electricity=ts .intersect (mi lk, log (electricity)) 

The  ts  .  intersect  function  merges  several  time  series  into  a  matrix  (panel)  of  time 
series  over  the  time  frame  where  each  series  has  data.  The  object  milk .  electric¬ 
ity  is  a  matrix  of  two  time  series,  the  first  column  of  which  is  the  milk  series  and  the 
second  the  log  of  electricity,  over  the  time  period  when  these  two  series  overlap. 

plot (milk . electricity, yax . f lip=T) 

The  option  yax .  f  lip=T  flips  the  label  for  the  y-axis  for  the  series  alternately  so  as  to 
make  the  labeling  clearer. 

#  Exhibit  11.15  on  page  265. 

ccf (milk. electricity [ , 1] , milk . electricity [, 2] , 
main='milk  &  electricity ', ylab= 1 CCF ' ) 

The  expression  milk. electricity!,  1]  extracts  the  milk  series  and 
milk .  electricity  [ ,  2  ]  the  log  electricity  series. 

The  as  .  vector  function  strips  the  time  series  attribute  from  the  time  series.  This 
is  done  to  nullify  the  default  way  that  the  ccf  function  plots  the  cross-correlations.  You 
may  want  to  repeat  the  command  without  the  as  .vector  function  to  see  the  default 
labels  of  the  lags  according  to  the  period  of  the  data. 

ccf ( (milk. elect ricity [ , 1] ) , (milk. electricity!, 2] ) , 
main='milk  &  electricity ', ylab= ' CCF ' ) 

The  bracket  operator  extracts  a  submatrix  from  a  matrix,  say  M,  in  the  form  of 
M  [vl ,  v2  ] ,  where  vl  indicates  which  rows  are  kept  and  v2  indicates  which  columns 
are  retained.  Consequently,  the  submatrix  M  [vl ,  v2]  contains  all  elements  of  M  in  the 
intersection  of  the  retained  rows  and  columns.  If  vl  (v2)  is  missing,  then  all  rows  (col¬ 
umns)  are  retained.  Hence,  M  [ ,  1]  is  simply  the  submatrix  consisting  of  the  first  col¬ 
umn  of  M.  However,  R  adopts  the  convention  that  a  submatrix  with  a  single  row  or 
column  is  “demoted”  to  a  vector;  that  is,  it  loses  one  dimension.  This  convention  makes 
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sense  in  most  cases.  However,  if  you  do  matrix  algebra  in  R,  this  convention  may  result 
in  strange  error  messages!  To  prevent  automatic  dimension  reduction,  use 
M  [vl ,  v2  ,  drop=F]  .  Instead  of  specifying  which  rows  or  columns  are  to  be  retained 
in  the  submatrix,  you  can  specify  which  rows  or  columns  are  to  be  deleted  by  specifying 
the  negative  of  their  positions.  Or  vl  (v2)  can  be  specified  as  a  Boolean  vector,  where 
the  positions  to  be  retained  (eliminated)  are  denoted  by  TRUE  (FALSE). 

#  Exhibit  11.16  on  page  267. 
me . dif =ts . intersect (dif f (dif f (milk, 12 ) ) , 
dif f (dif f (log (electricity) , 12) ) ) 
prewhiten (as . vector (me . dif [, 1] ) , as .vector (me . dif [ , 2 ] ) , 
ylab= ' CCF 1 ) 

The  prewhiten  function  expects  two  time  series  input  via  the  x  and  y  arguments. 
Both  series  will  be  filtered  according  to  an  ARIMA  model.  The  ARIMA  model  can  be 
supplied  via  the  x .  model  argument  and  should  be  the  output  of  the  arima  function.  If 
no  ARIMA  model  is  supplied,  an  AR  model  will  be  fitted  to  the  x  series,  with  the  AR 
order  selected  by  minimizing  the  AIC.  The  prewhiten  function  computes  and  plots 
the  cross-correlation  function  (CCF)  of  the  residuals  of  the  x  series  and  those  of  the  y 
series  from  the  same  (supplied  or  fitted)  model. 
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Below,  we  show  how  to  implement  the  Jarque-Bera  test  for  normality  in  two  different 
ways.  First,  we  show  the  direct  approach. 

skewness (r . cref ) 

This  computes  the  skewness  of  the  r.cref  series, 
kurtosis (r.cref) 

This  computes  the  kurtosis  of  the  data. 

length (r . cref) * skewness (r.cref) A2/6 

The  function  length  returns  the  length  of  the  vector  (time  series)  passed  into  it,  so  the 
expression  above  computes  the  first  part  of  the  Jarque-Bera  statistic. 

length (r.cref) *kurtosis (r.cref) A2/24 
computes  the  second  half  of  the  Jarque-Bera  statistic. 

JB= length (r.cref) * (skewness (r.cref)^2/6  + 
kurtosis (r.cref) ^2/24) 

The  object  JB  then  contains  the  Jarque-Bera  statistic  and  the  command  JB  prints  out  the 
statistic.  The  command  1-pchisq  ( JB  ,  df  =  2  )  computes  the  p- value  of  the 
Jarque-Bera  test  for  normality.  The  function  pchisq  computes  the  cumulative  proba¬ 
bility  of  a  chi-square  distribution  being  less  than  or  equal  to  the  value  in  the  first  argu¬ 
ment.  The  df  argument  of  the  pchisq  function  specifies  the  degrees  of  freedom  for 
the  chi-square  distribution.  Because  the  p-value  equals  the  right  tail  area,  it  equals  1 
minus  the  cumulative  probability.  Besides  pchisq,  other  functions  associated  with  the 
chi-square  distribution  include  qchisq,  which  computes  quantiles;  dchisq,  which 
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computes  the  probability  density;  and  rchisq,  which  simulates  realizations  from  the 
chi-square  distributions.  Use  Help  in  R  to  learn  more  about  these  functions.  For  other 
probability  distributions,  similar  functions  are  available.  Associated  with  the  normal 
distributions  are  rnorm,  pnorm,  dnorm,  and  qnorm.  Check  out  the  usages  of  the  rel¬ 
evant  functions  for  the  binomial  (binom),  Poisson,  and  other  distributions. 

library (tseries) 

This  loads  the  tseries  library,  which  contains  a  number  of  functions  needed  for  the 
analysis  reported  in  this  chapter.  Run  library  (help=tseries)  for  more  informa¬ 
tion  about  the  tseries  package. 

jarque.bera.test (r.cref ) 

This  carries  out  the  Jargue-Bera  test  for  normality  with  the  time  series  r .  cref . 

#  Exhibit  12.9  on  page  283. 

McLeod. Li . test (y=r. cref) 

This  performs  the  McLeod-Li  test  for  presence  of  ARCH  in  the  daily  CREF  returns.  The 
first  two  arguments  of  the  function  are  obj  ect  and  y,  respectively.  For  the  test  with 
raw  data,  the  time  series  is  supplied  to  the  function  via  the  y  argument.  Then,  the  func¬ 
tion  computes  the  Box-Ljung  statistics  with  the  autocorrelations  of  the  squared  data  to 
detect  for  conditional  heteroscedascity.  The  test  is  carried  out  with  the  first  m  autocorre¬ 
lations  of  the  squared  data,  with  m  ranging  from  1  to  the  maximum  lag  specified  by  the 
gof  .  lag  argument.  If  the  gof  .  lag  argument  is  missing,  the  default  is  set  to 
nlogjo(n)  where  n  is  the  sample  size. 

The  McLeod-Li  test  can  also  be  applied  to  residuals  from  an  ARMA  model  fitted  to 
the  data.  For  example,  the  US  dollar/Hong  Kong  dollar  exchange  rate  data  was  found  to 
admit  an  AR(1 )  +  outlier  model.  The  need  for  incorporating  ARCH  in  the  model  for 
the  exchange  rate  data  can  be  tested  by  the  command 

McLeod.Li .test (arima (hkrate , order =c (1,0,0) , 
xreg=data . frame (outlierl) ) ) 

Note  that  object  is  the  first  argument  so  in  the  above  command,  the  fitted  AR(1)  +  out¬ 
lier  model  is  passed  into  the  function.  The  function  then  computes  the  test  statistics 
based  on  the  squared  residuals  from  the  fitted  AR(1)  +  outlier  model.  If  the  object  argu¬ 
ment  is  supplied  explicitly  or  implicitly,  the  y  argument  is  ignored  by  the  function  even 
if  it  is  supplied.  Remember  that  to  apply  the  test  to  raw  data,  the  y  argument  must  be 
supplied  and  the  object  argument  suppressed. 

#  Exhibit  12.11  on  page  286. 
set . seed (1235678) 

garchOl . sim=garch . sim (alpha=c (.01, .9) ,n=500) 

The  garch .  sim  function  simulates  a  GARCH  process,  with  the  ARCH  coefficients 
supplied  via  the  alpha  argument  and  the  GARCH  coefficients  via  the  beta  argument. 
The  sample  size  is  passed  into  the  function  via  the  n  argument.  In  the  example  above, 
alpha=c  ( .  01,  .9)  specifies  that  the  constant  term  is  0.01  and  the  ARCH(l)  coeffi¬ 
cient  equals  0.9.  So,  garchOl .  sim  saves  a  realization  from  an  ARCH(l)  process. 

#  Exhibit  12.25  on  page  300. 
ml=garch (x=r . cref , order =c (1,1) ) 
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This  fits  a  GARCH(1,1)  model  with  the  r  .  cref  series.  The  garch  function  estimates 
a  GARCH  model  by  maximum  likelihood.  The  time  series  is  supplied  into  the  function 
by  the  x  argument  and  the  GARCH  order  by  the  order  argument.  The  order  takes 
the  form  c  (p ,  q)  where  p  is  the  GARCH  order  and  q  the  ARCH  order. 

summary (ml ) 

This  summarizes  the  fitted  GARCH(1,1)  model.  Ignore  the  Box-Ljung  test  results 
reported  in  the  summary,  as  the  generalized  portmanteau  tests  should  be  used;  see  the 
book. 

#  Exhibit  12.29  on  page  305. 
gBox (ml , method= 1  squared 1 ) 

The  gBox  function  computes  the  generalized  portmanteau  test  for  checking  whether  or 
not  there  is  any  residual  heteroscedasticity  in  the  residuals  of  a  fitted  GARCH  model.  It 
requires  supplying  the  fitted  GARCH  model  from  the  garch  function  through  the  first 
argument  (the  model  argument,  the  first  argument  of  the  function).  By  default,  the  tests 
are  carried  out  with  the  squared  residuals  from  the  fitted  GARCH  model.  To  inspect 
absolute  residuals,  use  the  option  method=  '  absolute  ' .  By  default,  the  test  is  car¬ 
ried  out  for  the  ACF  for  lags  from  1  to,  say,  K,  where  K  runs  from  1  to  20.  The  collection 
of  K’s  can  be  specified  by  the  lags  argument.  For  example,  to  carry  out  the  test  for  K 
ranging  from  1  to  30,  supply  the  option  lags  =  l :  3  0. 

gBox (ml , lags=20 , plot =F, x=r . cref ,  method= ' squared ' ) $pvalue 
prints  out  the  /?- values  of  the  generalized  portmanteau  test  with  the  squared  residuals 
and  K  =  20;  that  is,  it  tests  any  residual  heteroscedasticity  based  on  the  first  20  lags  of 
residual  ACF  of  the  squared  residuals  from  the  fitted  GARCH  model.  Plotting  is 
switched  off  by  the  plot  =  F  option.  The  gBox  function  returns  a  list,  an  element  of 
which  is  named  pvalue  and  contains  the  /> values  of  the  test  for  each  K.  Thus,  the 
command  prints  out  the  /rvalue  for  the  test  with  K  =  20. 

#  Exhibit  12.30  on  page  306. 

acf (abs (residuals (ml) ) , na . action=na . omit) 

As  the  initial  residuals  from  a  fitted  GARCH  model  may  be  missing,  it  is  essential  to 
instruct  the  ACF  to  omit  all  missing  values  through  the  argument  na  .  action= 
na  .  omit  (the  preferred  action  when  encountering  a  missing  value  is  to  omit  it).  If  this 
argument  is  omitted,  the  acf  function  uses  all  data  and  will  return  missing  values  if 
there  are  any  missing  data. 

Overfitting  the  GARCH(  1,2)  model  to  the  CREF  returns  can  be  carried  out  by  the 
following  command 

m2=garch (x=r . cref , order =c (1,2)  ) 
summary (m2 , diagnostics=F) 

The  summary  is  based  on  the  summary .  garch  function  in  the  tseries  package. 
Note  that  the  /^-values  of  the  Ljung-Box  test  from  the  summary  are  invalid;  the  general¬ 
ized  portmanteau  tests  should  be  used  instead.  Hence,  the  diagnostics  are  turned  off. 

AIC (m2) 

This  computes  the  AIC  of  the  fitted  GARCH  model  ml. 
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#  Exhibit  12.31  on  page  306. 

gBox (ml , x=r . cref , method= 1  absolute ' ) 

This  carries  out  the  generalized  portmanteau  test  based  on  the  absolute  residuals, 
shapiro. test (na.omit (residuals (ml) ) ) 

This  computes  the  Shapiro-Wilk  test  for  normality  with  the  residuals  from  the  fitted 
model  ml.  The  function  na  .  omit  strips  all  missing  values  from  the  residuals.  Thus, 
the  test  is  carried  out  with  the  nonmissing  residuals.  Without  preprocessing  the  residuals 
by  the  na  .  omit  function,  the  test  may  return  a  missing  value  if  some  of  the  residuals 
are  missing! 

#  Exhibit  12.32  on  page  307. 
plot ((fitted (ml) [ , 1] ) *2 , type= 1 1 1 , 

ylab= ' conditional  variance ' , xlab= ' t ' ) 

The  fitted  function  is  a  smart  function  that  processes  differently  depending  on  the 
fitted  model  passed  to  it  as  the  first  argument.  If  the  fitted  model  is  some  output  from  the 
garch  function,  the  default  output  from  the  fitted  function  is  a  two-column  matrix 
whose  first  column  contains  the  one-step-ahead  conditional  standard  deviations.  Hence, 
their  squares  are  the  conditional  variances.  So  (fitted  (ml)  [ ,  1]  )  ^2  computes  the 
time  series  of  estimated  one-step-ahead  conditional  variances  based  on  the  model  ml. 
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#  Exhibit  13.3  on  page  323. 

The  periodogram  of  a  time  series  can  be  computed  and  plotted  by  the  function  peri  - 
odogram  into  which  the  data  are  passed  as  its  first  argument. 

sp=periodogram (y) ;  abline(h=0) ; 
axis (l,at=c( 0.04 167,  .14  5  83) ) 

The  function  periodogram  has  several  useful  arguments.  Setting  log=  '  yes  1  tells 
R  to  plot  on  a  log  scale,  whereas  log=  '  no  1  (the  default)  says  to  plot  on  a  linear  scale. 
Other  arguments  for  the  plot  function  may  be  passed  into  the  function  to  make  better 
graphs.  The  function  axis  draws  an  axis  with  the  first  argument  specifying  the  side  on 
which  the  axis  is  drawn.  The  sides  are  labeled  from  1  to  4  starting  from  the  bottom  in  a 
clockwise  direction.  The  vector  of  locations  of  the  tick  marks  can  be  specified  by  the  at 
argument.  The  command  above  instructs  R  to  draw  an  (additional)  axis  on  the  bottom  of 
the  figure  with  tick  marks  placed  at  0.04167  and  0.14583. 

#  Exhibit  13.9  on  page  333. 

theta= . 9  #  Reset  theta  for  other  MA(1)  plots 
ARMAspec (model  =  list (ma=-theta)  ) 

The  function  ARMAspec  calculates  and  plots  the  theoretical  spectral  density  function  of 
the  ARMA  model  supplied  to  the  function  as  the  first  argument.  Recall  that  R  uses  the 
plus  convention  in  the  MA  specification,  so  the  minus  sign  is  added  to  theta.  The  format 
of  the  model  is  the  same  as  that  for  the  arima  function. 
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#  Exhibit  14.2  on  page  353. 

The  spec  function  can  estimate  the  spectral  density  function  by  locally  averaging  the 
periodogram  via  some  suitable  kernel  function.  The  function  spec  has  several  useful 
arguments.  Setting  log=  '  yes  '  tells  R  to  plot  on  a  log  scale  whereas  log=  '  no  1  says 
to  plot  on  a  linear  scale.  Data  may  be  detrended  (fitting  a  linear  time  trend)  by  setting 
detrend=T,  and  tapering  may  be  enforced  by  setting  taper  to  some  fraction  between  0 
and  0.5.  The  default  options  are:  taper=0  and  detrend=F. 

k=kernel ( 1 daniell ' , m=15 ) 

Here,  the  object  k  contains  the  Daniell  kernel  function  with  halfwidth  15.  Use  Help  in  R 
to  learn  more  about  the  kernel  function. 

sp=spec (y, kernel=k, log= 1  no ' , sub= '  '  , 

xlab=  '  Frequency  1  ,  ylab=  1  Smoothed  Sample  Spectral  Density') 

Specifying  the  kernel  to  be  the  Daniell  kernel  function  instructs  R  to  compute  and  plot 
the  spectral  density  estimate,  where  the  estimate  at  a  certain  frequency  is  obtained  by 
averaging  the  current  (raw)  periodogram  value,  the  neighboring  15  periodogram  values 
on  its  left,  and  another  15  periodogram  values  on  its  right.  More  or  less  local  averaging 
can  be  specified  through  the  m  argument  in  the  kernel  function. 

lines (sp$f req, ARMAspec (model =list (ar=phi) , f req=sp$f req, 
plot  =  F) $spec, lty= 1  dotted' ) 

This  adds  the  theoretical  spectral  density  function. 

#  Exhibits  14.11  and  14.12,  page  364. 

#  Spectral  analysis  of  simulated  series 
set . seed (271435) 

n=10  0 

phi 1=1 . 5 ;  phi2=-.75  #  Reset  parameter  values  to  obtain 
Exhibits  14.13  &  14.14 
y=arima . sim (model=list (ar=c (phil , phi2) ) , n=n) 

This  simulates  an  AR(2)  time  series  of  length  100. 

spl=spec (y , spans=3 , sub= ' ' , lty= ' dotted ' ,  xlab= ' Frequency ' , 
ylab= ' Log (Estimated  Spectral  Density)') 

This  estimates  the  special  density  function  using  the  modified  Daniell  kernel  (the 
default  kernel  when  the  kernel  argument  is  missing  and  the  spans  argument  is  sup¬ 
plied).  The  spans  argument  supplies  the  width  of  the  kernel  function;  that  is,  it  is  twice 
the  m  argument  in  the  kernel  function  plus  1.  Here,  spans  =  3  specifies  local  averaging 
of  three  consecutive  periodogram  values.  Note  that  local  averaging  may  be  repeated  by 
passing  a  vector  as  the  value  of  spans.  For  example,  setting  spans  =  c  (3,5)  per¬ 
forms  local  averaging  twice.  The  estimated  function  obtained  by  local  averaging  with 
spans  =  3  is  then  averaged  again  locally  with  spans  =  5.  Repeated  averaging  with  a 
modified  Daniell  (rectangular)  kernel  is  similar  to  averaging  using  a  bell-shaped  kernel 
due  to  the  Central  Limit  effect. 
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sp2=spec (y, spans =9 , plot=F) 

This  computes  the  spectrum  estimate  using  a  wider  window  encompassing  nine  peri- 
odogram  values  without  plotting  via  the  plot  =  F  argument.  The  output  of  the  spec 
function  is  saved  into  an  object  named  sp2. 

sp3=spec (y, spans=15 , plot=F) 

This  uses  an  even  wider  window.  How  many  periodogram  values  are  included  in  each 
local  averaging? 

lines (sp2$f req, sp2$spec , lty= 1  dashed  1 ) 

This  plots  the  smoother  spectrum  estimate  (spans=9)  as  a  dashed  line, 
lines (sp3$f req, sp3$spec , lty= 1  dot dash  1 ) 

This  plots  the  smoothest  spectrum  estimate  (spans  =  15)  as  a  dotdash  line, 
f =seq (0.001, . 5 , by= .001) 

This  creates  an  arithmetic  sequence  starting  from  0.001  and  ending  at  0.5,  with  incre¬ 
ments  0.001,  which  is  then  saved  into  the  object  f . 

lines ( f , ARMAspec (model=list (ar=c (phil,phi2) ) , freq=f , 
plot  =  F) $spec , lty= 'solid  1 ) 

This  plots  the  theoretical  spectral  density  function  for  the  specified  ARMA  model  as 
connected  line  segments  on  top  of  the  estimated  spectral  density  plot. 

#  Exhibit  14.12  on  page  365. 

sp4  =  spec (y , method= ' ar ' , lty= ' dotted ' ,  xlab= 1  Frequency 1 , 
ylab= ' Log (Estimated  AR  Spectral  Density)') 

This  estimates  the  spectral  density  function  using  the  theoretical  spectral  density  func¬ 
tion  of  an  AR  model  fitted  to  the  data  by  minimizing  the  AIC. 

f =seq (0.001, . 5 , by= .001) 

lines (f , ARMAspec (model  =  list (ar=c (phil,phi2) )  , 
f req=f , plot=F) $spec , lty= 'solid ' ) 

This  plots  the  theoretical  spectral  density  function. 

sp4$method 

This  displays  the  order  of  the  AR  model  selected. 
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#  Exhibit  15.1  on  page  386. 
set . seed (2534567) 
par (mf row=c (3,2) ) 

y=arima . sim (n=61 , model =list (ar=c (1.6, -0.94) , ma=-0 . 64 ) ) 

This  simulates  an  ARMA(2,1)  series  of  sample  size  61. 
lagplot (y) 

This  plots  the  lagged  regression  plots,  where  the  time  series  is  plotted  against  its  lags 
and  a  smooth  curve  is  superimposed  on  each  scatter  diagram.  The  smooth  curves  are 
obtained  by  local  linear  fits  to  the  data.  By  increasing  the  value  specified  in  the  nn  argu- 
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ment  (default  nn=0 . 7),  the  local  fitting  scheme  uses  more  local  data,  resulting  in  a 
smoother  fit  that  is  likely  to  be  more  biased  but  less  variable  due  to  more  smoothing.  On 
the  contrary,  decreasing  the  value  in  the  nn  argument  leads  to  a  rougher  fit  that  is  less 
biased  but  more  variable  due  to  less  smoothing.  The  smooth  curve  in  the  scatter  diagram 
of  the  time  series  response  versus  its  lag  j  estimates  the  conditional  mean  response  given 
its  lagj  as  a  function  of  the  value  of  the  lagj  of  the  response.  By  default,  lagplot 
plots  the  lagged  regression  plot  for  lags  1  to  6.  More  lags  can  be  computed  via  the 
lag  .  max  argument.  For  instance,  lag .  max=12  computes  the  lagged  regression  plots 
for  lags  1  through  12.  Note  that  the  lagplot  function  requires  the  installation  of  the 
1  o  c  f  i  t  package  of  R. 

#  Exhibit  15.2  on  page  387. 
data (veilleux) 

The  dataset  veilleux  is  a  matrix  consisting  of  two  time  series.  Its  first  column  is  the 
series  of  Didinium  abundance  and  the  second  column  the  series  of  Paramecium  abun¬ 
dance,  each  counted  every  12  hours.  The  basic  time  unit  is  days,  so  these  are  series  of 
frequency  2,  as  they  are  sampled  twice  per  day. 

predator=veilleux [ , 1] 

This  defines  the  predator  series  as  the  abundance  series  of  Didinium. 

plot (log (predator) , lty=2 , type= ' b ' , xlab= ' Day ' , 
ylab= ' Log (predator) 1 ) 

This  plots  the  entire  log-transformed  predator  series  as  a  dashed  line, 
predator . eq=window (predator, start=c (7,1) ) 

This  subsets  the  “stationary”  part  of  the  predator  series  that  appears  to  begin  on  the  sev¬ 
enth  day  of  the  experiment.  Subsequent  analyses  of  the  predator  series  reported  in  the 
text  were  done  with  this  log-transformed  stationary  subseries. 

lines (log (predator . eq)  ) 

This  draws  the  stationary  part  as  a  solid  line, 
indexl=zlag (log (predator .eq) , 3 ) <=4 . 661 

The  command  z lag  (log  (predator  .  eq)  ,  3  )  returns  the  lag  3  of  the  (log-trans¬ 
formed)  predator  series.  The  expression  zlag  (log  (predator,  eq)  ,  3)  <=4 . 661 
computes  a  Boolean  vector  whose  elements  are  TRUE  if  and  only  if  their  corresponding 
element  of  the  lag  3  of  the  predator  series  is  less  than  or  equal  to  4.661.  The  Boolean 
vector  is  saved  in  an  object  named  indexl.  Other  comparison  operators,  including  >=, 
>,  <,  and  ==,  can  be  used  to  compare  the  vectors  on  the  two  sides  of  the  comparison 
operator.  In  the  example  above,  the  left-hand  side  of  <  =  is  a  vector,  but  its  right-hand 
side  is  a  scalar!  The  discrepancy  is  resolved  by  the  recycling  rule,  that  R  replicates  the 
shorter  vector  repeatedly  to  match  its  longer  part.  Note  that  the  equality  operator  is 
denoted  by  the  double  equal  sign  ==,  as  the  single  equal  sign  represents  the  assignment 
operator! 
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points (y=log (predator . eq)  [indexl]  ,  (time (predator . eq) ) 
[indexl] , pch=19) 

This  draws  as  solid  circles  (pch=19)  those  data  points  whose  lag  3  of  the  predator 
abundance  is  less  than  or  equal  to  4.661.  Run  the  command  ?points  to  learn  other 
styles  for  plotting  data  points. 

#  Tests  for  nonlinearity,  page  390. 

Keenan. test (sqrt (spots) ) 

This  carries  out  Keenan’s  test  for  linearity.  The  working  order  of  the  AR  process  under 
the  null  hypothesis  of  linearity  can  be  supplied  via  the  order  argument.  For  example, 
order=2  sets  the  working  AR  order  to  2.  If  the  order  argument  is  missing,  the  order  is 
automatically  determined  by  minimizing  the  AIC  via  the  ar  function.  The  ar  function 
by  default  estimates  the  models  by  solving  the  Yule- Walker  equations.  But  other  estima¬ 
tion  methods  may  be  used  by  including  the  method  argument  when  calling  the 
Keenan  .  test  function;  for  example,  method=  1  mle  '  specifies  using  maximum 
likelihood  in  the  ar  function. 

Tsay. test (sqrt (spots) ) ,  page  390. 

This  implements  Tsay’s  test  for  linearity;  see  Tsay  (1986).  The  design  of  the 
Tsay .  test  function  and  its  arguments  are  similar  to  those  of  the  Keenan  .  test 
function. 

#  Exhibit  15.6  on  page  400. 

y=qar . sim (n=100 , const =0 . 0 , phi 0=3 . 97 , 
phil=-3 . 97 , sigma=0 , init= . 377) 

The  function  qar .  sim  simulates  a  time  series  realization  from  a  first-order  quadratic 
AR  model  where  phiO  is  the  coefficient  of  the  lag  1  and  phil  is  that  of  the  square  of 
lag  1.  The  default  intercept  is  zero,  otherwise  it  can  be  set  by  the  const  argument.  The 
innovation  standard  deviation  is  passed  into  the  function  via  the  sigma  argument.  Here, 
sigma=l  sets  the  standard  deviation  to  be  1.  The  argument  n=15  sets  the  sample  size 
to  15.  Finally,  the  argument  init=  .  377  sets  the  initial  value  to  be  0.377.  The  default 
initial  value  is  0. 

plot (x=l : 10  0 , y=y, type= ' 1 ' , ylab=expression (Y  [t] ) , xlab= 1 1 1 ) 
The  output  of  the  qar .  sim  function  is  a  vector.  To  draw  the  time  sequence  plot,  both 
the  r-variahle  and  the  y-variable  have  to  be  specified. 

#  Exhibit  15.8  on  page  411. 
set . seed (1234579) 

y=tar . sim (n=100 , Phil=c (0,0.5) , Phi2=c ( 0 , -1 . 8 ) , p=l , d=l , 
sigmal=l , thd=- 1 , sigma2=2 ) $y 

The  function  tar.  sim  simulates  time  series  realizations  from  a  two-regime  TAR 
model.  The  order  of  the  model  is  specified  by  the  p  argument,  so  p  =  l  specifies  a 
first-order  model.  The  delay  is  passed  into  the  function  by  the  d  argument,  so  d  =  1 
specifies  the  delay  to  be  1 .  The  AR  coefficient  vector  for  the  lower  (upper)  regime,  with 
the  intercept  being  the  first  component,  is  supplied  via  the  Phil  (Phi 2)  argument.  The 
thd=-l  argument  imposes  the  threshold  parameter  of-1.  The  innovation  standard 
deviations  for  the  lower  and  upper  regimens  are  specified  via  the  sigmal  and  sigma 2 
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arguments,  respectively.  The  simulated  TAR  model  in  the  example  is  conditionally  het- 
eroscedastic,  as  the  innovation  standard  deviation  for  the  upper  regime  is  twice  that  for 
the  lower  regime.  The  sample  size  is  set  to  100  by  the  n=10  0  argument. 

The  likelihood  ratio  test  for  threshold  nonlinearity,  assuming  normally  distributed 
innovations,  can  be  carried  out  by  the  tlrt  function,  with  which  the  data  enter  into  the 
function  as  the  first  argument.  Other  required  information  includes  the  order  and 
delay  arguments.  Also,  the  threshold  parameter  must  be  searched  over  a  finite  interval 
from  the  a  times  100  percentile  to  the  b  times  100  percentile  of  the  data.  Often,  data 
have  to  be  transformed  before  testing  for  nonlinearity,  which  can  be  specified  by  supply¬ 
ing  the  transformed  data  or  supplying  the  raw  data  with  the  transform  argument  set  to 
one  of  the  available  options:  'no'  (means  no  transformation,  the  default),  'log', 
'loglO',  or  'sqrt'.  For  example,  the  following  command  does  the  likelihood  ratio  test 
of  the  null  hypothesis  that  the  square  root  transformation  of  relative  sunspot  data  is  an 
AR(5)  process  versus  the  alternative  that  it  follows  a  threshold  model  with  delay  1, 
order  5,  and  with  the  threshold  parameter  searched  from  the  first  to  the  third  quartile  of 
the  (transformed)  data. 

tlrt (sqrt (spots) , p=5 , d=l , a=0 . 25 , b=0 . 75 ) 

The  tlrt  function  outputs  a  list  containing  the  test  statistic  and  its  /rvalue.  In  practice, 
the  true  delay  of  the  threshold  model  is  unknown,  although  it  is  likely  to  be  between  1 
and  the  order  of  the  model.  (The  delay  may  be  specified  to  some  value  greater  than  the 
order  if  this  is  deemed  appropriate.)  The  command  above  can  be  replicated  a  number  of 
times  for  each  possible  delay  value.  A  more  elegant  way  is  to  use  a  for  loop  as  fol¬ 
lows. 

#  Tests  for  threshold  nonlinearity,  page  400. 
pvaluem=NULL 

This  defines  an  empty  object  named  pvaluem. 
for  (d  in  1:5) 

{ res=tlrt (sqrt (spots) , p=5,d=d,a=0.25,b=0.75) ;  pvaluem= 

cbind (pvaluem, c(d,res$test. statistic, res $p .value) ) } 

The  statements  within  the  curly  brackets  are  repeated  for  each  value  the  variable  d  takes 
sequentially  from  the  vector  1:5,  which  contains  the  first  five  positive  integers.  Thus,  d 
is  first  set  to  1,  and  the  likelihood  ratio  test  for  threshold  nonlinearity  is  carried  out,  with 
its  output  stored  in  an  object  named  res.  The  command  c  (d,  res$test .  statis¬ 
tic,  res$p. value)  creates  a  vector  containing  the  value  1,  the  likelihood  ratio  test 
statistic,  and  its  p-value.  The  vector  so  created  is  then  augmented  to  the  right-hand  side 
of  pvaluem  to  form  a  matrix.  So,  after  the  first  loop,  pvaluem  is  a  matrix  consisting 
of  the  test  results  for  d=l.  Then  the  loop  sets  d  to  the  second  value,  namely  2;  carries 
out  the  threshold  likelihood  ratio  test  for  d=2;  augments  the  test  results  for  d=2  to  the 
right-hand  side  of  pvaluem;  and  so  forth  until  the  loop  exhausts  all  possible  values  for 
d  and  n  and  then  R  exits  from  the  loop. 

rownames (pvaluem) =c ( 1 d 1 ,  1  test  statistic ' ,  ' p-value 1 ) 

This  labels  the  rows  of  the  pvaluem  matrix,  with  the  first  row  labeled  as  “d”,  the  sec¬ 
ond  “test  statistic”,  and  the  third  row  “p-value”. 
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round (pvaluem, 3 ) 

This  prints  out  the  matrix  (table)  of  test  results,  with  the  numbers  rounded  to  three  deci¬ 
mal  places.  Note  that  the  computational  efficiency  of  the  R  code  above  can  be  improved 
by  declaring  pvaluem  as  a  matrix  with  appropriate  dimension  (for  example,  pval- 
uem=  matrix  (  1 NA 1  ,  nrow=3  ,  ncol  =  5 ) )  in  which  the  test  results  are  saved. 

#  Exhibit  15.12  on  page  405. 

predator . tar . l=tar (y=log (predator . eq) , pl=4 , p2=4 , d=3 , a= . 1 , 
b= . 9 , print=T) 

This  fits  a  threshold  model  with  the  (log-transformed)  predator  .  eq  series  with  max¬ 
imum  AR  order  to  be  4  for  both  lower  and  upper  regimes,  d=3,  and  the  threshold 
parameter  searched  from  the  tenth  to  the  ninetieth  percentiles.  The  fitted  model  is 
printed  out  if  the  print  argument  is  set  to  T.  By  default,  the  function  uses  the  MAIC 
(minimum  AIC)  method  for  estimation,  with  the  AR  orders  estimated  as  well.  Another 
method  of  estimation  is  conditional  least  squares,  which  can  be  specified  by  the 
method=  '  CLS  ' ,  as  illustrated  in  the  next  command. 

In  the  command  below,  we  repeat  the  estimation  but  using  the  CLS  method.  Note 
that  the  CLS  method  does  not  estimate  the  AR  orders  of  the  two  regimes.  Instead,  the 
AR  orders  are  set  as  the  maximum  orders  specified  through  the  pi  and  p2  arguments! 
That  is  why  the  values  of  pi  and  p2  are  set  differently  from  the  previous  command  and 
in  fact  set  as  the  orders  estimated  from  the  model  using  the  MAIC  method. 

tar (y= log (predator . eq) , pl=l , p2=4 , d=3 , a= . 1 , b= . 9 , print=T, 
method= ' CLS 1 ) 

#  Exhibit  15.13  on  page  408. 
tar . skeleton (predator . tar . 1) 

This  computes  the  skeleton  of  a  TAR  model  supplied  as  the  first  argument,  with  a 
default  sample  size  of  500  values,  a  burn-in  of  500  values,  and  plots  the  time  sequence 
plot  of  the  last  50  values  of  the  skeleton.  The  TAR  model  is  usually  the  output  of  that  of 
the  obj  ect  argument  of  the  tar  function.  Alternatively,  the  model  parameters  can  be 
specified  in  a  format  similar  to  the  tar .  sim  function.  The  function  also  prints  a  sum¬ 
mary  statement  on  the  long-run  behavior  of  the  skeleton. 

#  Exhibit  15.14  on  page  408. 
set . seed (356813) 

plot (y=tar . sim (n=57 , obj  ect=predator . tar . 1) $y , x=l : 5  7 , 
ylab=expression (Y [t] ) , xlab=expression (t ) , type= ’o') 

This  plots  a  simulated  time  series  from  the  fitted  TAR(2;1,4)  model  to  the  predator 
series.  The  fitted  model  is  supplied  via  the  obj  ect  argument. 

#  Exhibit  15.20  on  page  414. 
tsdiag (predator . tar . 1 , gof . lag=20) 

This  carries  out  several  model  diagnostics  on  the  fitted  TAR(2;1,4)  model  to  the  preda¬ 
tor  series.  The  function  plots  a  time  sequence  plot  of  the  standardized  residuals,  the 
residual  ACF,  and  the  p-value  plots  of  the  generalized  portmanteau  tests.  The  argument 
gof  .  lag=2  0  specifies  that  the  last  two  plots  use  a  maximum  lag  of  20. 
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#  Exhibit  15.21  on  page  415. 
qqnorm (predator .tar.l$std.res) 

This  plots  the  quantile-quantile  normal  score  plot  for  the  standardized  residuals  from  the 
TAR(2;1,4)  model  fitted  to  the  predator  series. 

qqline (predator .tar.l$std.res) 
adds  the  reference  line  on  the  Q-Q  plot. 

#  Exhibit  15.22  on  page  417. 
set . seed (2357125 ) 

pred . predator=predict (predator . tar . 1 , n . ahead=6  0 , 
n . sim=1000 ) 

This  simulates  a  time  series  from  the  conditional  distribution  of  the  future  values  given 
the  data  and  a  threshold  model  (usually  the  output  of  the  tar  function,  here  being 
predator .  tar .  1 ) ,  with  a  forecast  horizon  of  a  maximum  sixty-step-ahead  predic¬ 
tions.  The  point  predictors  and  their  95%  prediction  limits  are  computed  by  simulation. 
The  simulation  size  is  specified  as  n .  sim=1000.  The  output  of  the  predict  function 
is  a  list  that  contains  the  prediction  means  as  a  vector  in  the  component  (element)  named 
fit  and  the  lower  and  upper  prediction  limits  as  a  matrix  in  the  pred .  interval 
component.  The  function  predict  is  a  smart  function  and  recognizes  that  the  first 
argument  is  a  TAR  model,  on  the  basis  of  which  it  computes  the  prediction.  To  learn 
more  about  the  predict  function  for  TAR  models,  run  ?predict.TAR.  The  exten¬ 
sion  TAR  signifies  the  particular  predict  function  for  processing  prediction  based  on  a 
TAR  model. 

yy=ts (c (log (predator . eq) , pred . predator$f it) , f requency=2 , 
start=start (predator . eq) ) 

This  augments  the  point  prediction  values  to  the  data, 
plot (yy, type= ' n' , 

ylim=range (c (yy , pred . predator$pred . interval) ) , 
ylab='Log  Prey',  xlab=expression (t) ) 

This  sets  up  a  plot  of  the  data  and  the  predicted  future  values  without  actual  plotting 
(type=  1  n ' ).  We  anticipate  superimposing  the  prediction  intervals,  so  the  range  of  the 
v-axis  is  specified  through  the  ylim  argument  to  the  vector  containing  the  minimum 
and  maximum  of  the  combined  vector  of  the  observed  +  predicted  values  (yy)  and  the 
prediction  limits  (pred .  predator$pred .  interval),  computed  via  the  range 
function. 

lines (log (predator . eq)  ) 

This  draws  the  data  as  a  solid  line. 

lines (window (yy,  start=end (predator . eq) +c ( 0 , 1) ) , lty=2 ) 

This  adds  the  curve  of  the  predicted  values  as  a  dashed  line. 

lines (ts (pred . predator$pred . interval [2 , ] , 

start=end (predator . eq) +c (0 , 1) , f req=2 ) , lty=2 ) 

This  adds  the  upper  prediction  limits. 
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lines (ts (pred . predator$pred . interval [1, ] , 

start=end (predator . eq) +c ( 0 , 1) , f req=2 ) , lty=2 ) 

This  adds  the  lower  prediction  limits. 

#  Exhibit  15.24  on  page  419. 
qqnorm (pred. predator$pred. matrix [, 3] ) 

The  output  of  the  predict  function  is  a  list  that  contains  another  component,  named 
pred. matrix,  which  is  a  matrix  containing  all  simulated  future  values,  with  the  first 
column  consisting  of  the  simulated  one-step-ahead  values,  the  second  column  those  of 
the  two-steps-ahead  values,  and  so  forth. 

qqnorm (pred. predator$pred. matrix [, 3] ) 

This  extracts  all  1000  simulated  three-steps-ahead  values,  which  are  then  passed  into  the 
qqnorm  function  to  make  the  Q-Q  normal  score  plot  for  these  data. 

qqline (pred . predator$pred . matrix [, 6] ) 

This  adds  the  reference  straight  line  for  checking  the  normality  of  the  three-steps-ahead 
conditional  distribution. 

Finally,  here  is  a  listing  and  brief  description  of  all  the  new  or  enhanced  functions 
that  are  contained  in  the  TSA  package. 


New  or  Enhanced  Functions  in  the  TSA  Library 

Function 

Description 

acf 

Computes  and  plots  the  sample  autocorrelation  function  start¬ 
ing  with  lag  1. 

arima 

This  command  has  been  amended  to  compute  the  AIC  accord¬ 
ing  to  our  definition. 

arima . boot 

Bootstraps  time  series  according  to  a  fitted  ARMA(p,rf,i/) 
model. 

arimax 

Extends  the  arima  function,  allowing  the  incorporation  of 
transfer  functions  and  innovative  and  additive  outliers. 

ARMAspec 

Computes  and  plots  the  theoretical  spectrum  of  an  ARMA 
model. 

armasubsets 

Finds  “best  subset”  ARMA  models. 

BoxCox . ar 

Finds  a  power  transformation  so  that  the  transformed  time 
series  is  approximately  an  AR  process  with  normal  error  terms. 

detectAO 

Detects  additive  outliers  in  time  series. 

detectIO 

Detects  innovative  outliers  in  time  series. 

eacf 

Computes  and  displays  the  extended  autocorrelation  function 
of  a  time  series. 

garch . sim 

Simulates  a  GARCH  process. 

gBox 

Performs  a  goodness-of-fit  test  for  fitted  GARCH  models. 

harmonic 

Creates  a  matrix  of  the  first  m  pairs  of  harmonic  functions  for 
fitting  a  harmonic  trend  (cosine-sine  trend,  Fourier  regression) 
model  with  a  time  series  response. 

New  or  Enhanced  Functions  in  the  TSA  Library 
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Function 

Description 

Keenan . test 

Carries  out  Keenan's  test  for  nonlinearity  against  the  null 
hypothesis  that  the  time  series  follows  some  AR  process. 

kurtosis 

Calculates  the  (excess)  coefficient  of  kurtosis. 

lagplot 

Computes  and  plots  nonparametric  regression  functions  of  a 
time  series  against  its  various  lags. 

periodogram 

Computes  the  periodogram  of  a  time  series. 

LB . test 

Computes  the  Ljung-Box  or  Box-Pierce  tests  checking  whether 
or  not  the  residuals  from  an  ARIMA  model  appear  to  be  white 
noise. 

McLeod . Li . test 

Perform  the  McLeod-Li  test  for  conditional  heteroscedascity 
(ARCH). 

plot . Arima 

Plots  a  time  series  and  its  predictions  (forecasts)  with  95%  pre¬ 
diction  bounds  based  on  a  fitted  ARIMA  model. 

predict . TAR 

Calculates  predictions  based  on  a  fitted  TAR  model.  The  errors 
are  assumed  to  be  normally  distributed  and  the  predictive  distri¬ 
butions  are  approximated  by  simulation. 

prewhiten 

Bivariate  time  series  are  prewhitened  according  to  an  AR 
model  fitted  to  the  x-component  of  the  bivariate  series.  Alterna¬ 
tively,  if  an  ARIMA  model  is  provided,  it  is  used  to  prewhiten 
both  series.  The  CCF  of  the  prewhitened  bivariate  series  is  then 
computed  and  plotted. 

qar . sim 

Simulates  a  first-order  quadratic  AR  model  with  normally  dis¬ 
tributed  white  noise  error  terms. 

rstandard . Arima 

Computes  internally  standardized  residuals  from  a  fitted 
ARIMA  model. 

runs 

Tests  the  independence  of  a  sequence  of  values  by  checking 
whether  there  are  too  many  or  too  few  runs  above  (or  below) 
the  median. 

season 

Extracts  season  information  from  a  time  series  and  creates  a 
vector  of  the  season  information.  For  example,  for  monthly 
data,  the  function  outputs  a  vector  containing  the  months  of  the 
data. 

skewness 

Calculates  the  skewness  coefficient  of  a  dataset. 

spec 

Allows  the  user  to  invoke  either  the  spec  .  pgr  am  function  or 
the  spec  .  ar  function  in  the  stats  package.  The  seasonal 
attribute  of  the  data,  if  it  exists,  is  surpressed  for  our  preferred 
way  of  presenting  the  output.  Alters  defaults  to  demean=T, 
detrend=F,  taper=0,  and  permits  plotting  of  confidence 
interval  bands. 

summary . armasub- 

sets 

Summary  method  for  class  armasubsets,  that  is  useful  for 
ARMA  subset  selection. 

tar 

Estimates  a  two-regime  TAR  model. 
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New  or  Enhanced  Functions  in  the  TSA  Library  (Continued) 

Function 

Description 

tar . sim 

Simulates  a  two-regime  TAR  model. 

tar . skeleton 

Obtains  the  skeleton  of  a  TAR  model  by  suppressing  the  noise 
term  in  the  TAR  model. 

tlrt 

Carries  out  the  likelihood  ratio  test  for  threshold  nonlinearity, 
with  the  null  hypothesis  being  a  normal  AR  process  and  the 
alternative  hypothesis  being  a  TAR  model  with  homogeneous, 
normally  distributed  errors. 

Tsay . test 

Carries  out  Tsay’s  test  for  quadratic  nonlinearity  in  a  time 
series. 

tsdiag . Arima 

Modifies  the  tsdiag  function  of  the  stats  package  sup¬ 
pressing  initial  residuals  and  displaying  Bonferroni  bounds.  It 
also  checks  the  condition  for  the  validity  of  the  chi-square 
asymptotics  for  the  portmanteau  tests. 

tsdiag . TAR 

Displays  the  time  series  plot  and  the  sample  ACF  of  the  stan¬ 
dardized  residuals.  Also,  portmanteau  tests  for  detecting  auto¬ 
correlations  in  the  standardized  residuals  are  computed  and 
displayed. 

zlag 

Computes  the  lag  of  a  vector,  with  missing  elements  replaced 
by  NA. 

Dataset  Information 


Filename/ 

Variable(s) 

airmiles 

airpass 


beersales 


bluebird: 
(log.sales  & 
price) 

bluebirdlite: 
(log.sales  & 
price) 

boardings: 
(log. boarding 
s  &  log. price) 


co2 

color 


CREF 


cref.bond 


days 


Description  and  Source 

Monthly  U.S.  airline  passenger-miles:  01/1996-05/2005.  Source: 
www.bts.gov/xml/air_traffic/src/index.xml#MonthlySystem 

Monthly  total  international  airline  passengers  from  01/1960- 
12/1971.  Source:  Box,  G.  E.  P.,  Jenkins,  G.  M.,  and  Reinsel,  G.  C. 
Time  Series  Analysis:  Forecasting  and  Control,  second  edition.  Pren¬ 
tice-Hall,  Engelwood  Cliffs,  NJ,  1994. 

Monthly  U.S.  beer  sales  (in  millions  of  barrels),  01/1975-12/1990. 
Source:  Frees.  E.  W.,  Data  Analysis  Using  Regression  Models .  Pren¬ 
tice-Hall,  Engelwood  Cliffs,  NJ,  1996. 

Weekly  unit  sales  of  Bluebird  standard  potato  chips  (New  Zealand) 
and  their  price  for  104  weeks.  From  the  website  of  Dr.  Andrew 
Balemi.  Source:  www.stat.auckland.ac.nz/~balemi/Assn3.xls 

Weekly  unit  sales  of  Bluebird  Lite  potato  chips  (New  Zealand)  and 
their  price  for  104  weeks.  From  the  website  of  Dr.  Andrew  Balemi. 
Source:  www.stat.auckland.ac.nz/~balemi/Assn3.xls 

Monthly  public  transit  boardings  (mostly  buses  and  light  rail),  Den¬ 
ver,  Colorado  region,  08/2000-03/2006.  Source:  Personal  communi¬ 
cation  from  Lee  Cryer,  Project  Manager,  Regional  Transportation 
District,  Denver,  Colorado.  Denver  gasoline  prices  were  obtained 
from  the  Energy  Information  Administration,  U.S.  Department  of 
Energy,  Washington,  D.C.,  at  www.eia.doe.gov 

Monthly  carbon  dioxide  levels  in  northern  Canada,  01/1994- 
12/2004.  Source:  http://cdiac.ornl.gov/ftp/trends/co2/altsio.co2 

Color  properties  from  35  consecutive  batches  of  an  industrial  chemi¬ 
cal  process.  Source:  Cryer,  J.  D.  and  Ryan,  T.  P,  “The  estimation  of 
sigma  for  an  X  chart”,  Journal  of  Quality  Technology ,  22,  No.  3, 
187-192. 

Daily  values  of  one  unit  of  the  CREF  (College  Retirement  Equity 
Fund)  Stock  fund,  08/26/04-08/15/06.  Source: 
www.tiaa-cref.org/performance/retirement/data/index.html 

Daily  values  of  one  unit  of  the  CREF  (College  Retirement  Equity 
Fund)  Bond  fund,  08/26/04-08/15/06.  Source: 
www.tiaa-cref.org/performance/retirement/data/index.html 

Accounts  receivable  data.  Number  of  days  until  a  distributor  of  Win- 
egard  Company  products  pays  their  account.  Source:  Personal  com¬ 
munication  from  Mark  Selergren,  Vice  President,  Winegard,  Inc., 
Burlington,  Iowa. 


Page(s) 

249 

104 

51 

267 

276 

248,  271, 
273 

234,  234 

3,  134, 
147,  165, 
176,  194 

278 

316 

147,  174, 
217,  276 
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Dataset  Information 


Filename/  Description  and  Source  (Continued)  Page(s) 

Variable(s) 

deerel  82  consecutive  values  for  the  amount  of  deviation  (in  0.000025  inch  146,  275 

units)  from  a  specified  target  value  that  an  industrial  machining  pro¬ 
cess  at  Deere  &  Co.  produced  under  certain  specified  operating  con¬ 
ditions.  Source:  Personal  communication  from  William  F.  Fulkerson, 

Deere  &  Co.  Technical  Center,  Moline,  Illinois. 

deere2  102  consecutive  values  for  the  amount  of  deviation  (in  0.0000025  146 

inch  units)  from  a  specified  target  value  that  another  industrial 
machining  process  produced  at  Deere  &  Co.  Source:  Personal  com¬ 
munication  from  William  F.  Fulkerson,  Deere  &  Co.  Technical  Cen¬ 
ter,  Moline,  Illinois. 

deere3  57  consecutive  values  from  a  complex  machine  tool  at  Deere  &  Co.  147,  174, 

The  values  given  are  deviations  from  a  target  value  in  units  of  ten  1 90,  217 
millionths  of  an  inch.  The  process  employs  a  control  mechanism  that 
resets  some  of  the  parameters  of  the  machine  tool  depending  on  the 
magnitude  of  deviation  from  target  of  the  last  item  produced.  Source: 

Personal  communication  from  William  F.  Fulkerson,  Deere  &  Co. 

Technical  Center,  Moline,  Illinois. 

eeg  An  electroencephalogram  (EEG)  is  a  noninvasive  test  used  to  detect  380 

and  record  the  electrical  activity  generated  in  the  brain.  These  data 
were  measured  at  a  frequency  of  256  per  second  and  came  from  a 
patient  suffering  a  seizure.  This  is  a  portion  of  a  series  on  the  website 
of  Professor  Richard  Smith,  University  of  North  Carolina.  His 
source:  Professors  Mike  West  and  Andrew  Krystal,  Duke  University. 

Source: 

http://www.stat.unc.edU/faculty/rs/s  1 33/Data/datadoc.html 

electricity  Monthly  U.S.  electricity  generation  (in  millions  of  kilowatt  hours)  of  99,  21 4, 

all  types:  coal,  natural  gas,  nuclear,  petroleum,  and  wind,  247,  264, 

01/1973-12/2005.  Source:  www.eia.doe.gov/emeu/mer/elect.html  380 

euph  A  digitized  sound  file  of  about  0.4  seconds  of  a  Bb  just  below  middle  374 

C  played  on  a  euphonium  by  one  of  the  authors  (JDC),  a  member  of 
the  group  Tempered  Brass. 

flow  Flow  data  (in  cubic  feet  per  second)  for  the  Iowa  River  measured  at  372,  381 

Wapello,  Iowa,  for  the  period  09/1958-08/2006. 

Source:  http://waterdata.usgs.gov/ia/nwis/sw 

gold  Daily  price  of  gold  (in  U.S.  dollars  per  troy  ounce),  01/04/2005-  1 05 

12/30/2005.  Source:  www.lbma.org.uk/2005dailygold.htm 

google  Daily  returns  of  Google  stock  from  08/20/04  to  09/13/06.  Source:  31 7 

http://finance. yahoo. com/q/hp?s=GOOG 
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Filename/ 

Variable(s) 

hare 

hours 

JJ 

larain 

milk 

oil. price 

oilfilters 

prescrip 

retail 

robot 

SP 


Description  and  Source  (Continued) 

Annual  Canadian  hare  abundance.  1905-1935.  Source:  Stenseth.  N. 
C..  Falck.  W.,  Bjpmstad,  O.  N.  and  Krebs.  C.  J.  (1997)  “Population 
regulation  in  snowshoe  hare  and  Canadian  lynx:  Asymmetric  food 
web  configurations  between  hare  and  lynx.’"  Proceedings  of  the 
Natlional  Academy  ofScinces,  USA ,  94,  5147-5152. 

Monthly  average  hours  worked  per  week  in  the  U.S.  manufacturing 
sector  for  07/1982-06/1987.  Source:  Cryer,  J.  D.  Time  Series  Analy¬ 
sis,  Duxbury  Press,  Boston,  1986. 

Quarterly  earnings  per  share  for  1960Q1-1980Q4  of  the  U.S.  com¬ 
pany,  Johnson  &  Johnson,  Inc.  From  the  web  site  of  David  Staffer. 
Source:  www.stat.pitt.edu/stoffer/tsa2/ 

Annual  rainfall  totals  for  Los  Angeles,  California,  1878-1992. 
Source:  Personal  communication  from  Professor  Donald  Bentley, 
Pomona  College,  Claremont,  California.  For  more  data  see 
www.wrh.noaa.gov/lox/climate/cvc.php 

Monthly  U.S.  milk  production  from  01/1994  to  12/2005.  Source: 
National  Agricultural  Statistics  Service:  usda.mannlib 
.  Cornell.  edu/MannUsda/viewDocumentInfo.do?documentID=l  103 

Monthly  spot  price  for  crude  oil,  Cushing,  OK  (in  U.S.  dollars  per 
barrel),  01/1986-01/2006.  U.S.  Energy  Information  Administration. 
Source:  tonto.eia.doe.gov/dnav/pet/hist/rwtcM.htm 

Monthly  wholesale  specialty  oil  filter  sales,  Deere  &  Co.,  07/1983- 
06/1987.  Source:  Personal  communication  from  William  F. 
Fulkerson,  Deere  &  Co.  Technical  Center,  Moline,  Illinois. 

Monthly  U.S.  average  prescription  costs  for  the  months  08/1986  - 
03/1992.  Source:  Frees,  E.  W.,  Data  Analysis  Using  Regression 
Models,  Prentice-Hall,  Engel  wood  Cliffs,NJ,  1996. 

Monthly  total  UK  (United  Kingdom)  retail  sales  (non-food  stores  in 
billions  of  pounds),  01/1983-12/1987. 

Source:  www.statistics.gov.uk/statbase/TSDdownloadl .asp 

Final  position  in  the  “x”  direction  of  an  industrial  robot  put  through  a 
series  of  planned  exercises  many  times.  Source:  Personal  communi¬ 
cation  from  William  F.  Fulkerson,  Deere  &  Co.  Technical  Center, 
Moline,  Illinois. 

Quarterly  S&P  Composite  Index,  1936Q1-1977Q4,  Source:  Frees, 
E.  W.,  Data  Analysis  Using  Regression  Models,  Prentice-Hall, 
Engelwood  Cliffs, NJ,  1996. 


Page(s) 


4,  136, 
152,  176, 
206 


51 


105,  248 


1,49,  105, 
133,  379 


264,  374, 
374 


87,  125, 
153,  177, 
276,  317 

6 


52 


52 


147,  174, 
190,  217, 
370 
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Dataset  Information 


Filename/ 

Variable(s) 

spots 

spots  1 

star 

tbone 

tempdub 

tuba 

units 

usd.hkd 


veilleux:  Day, 

Didinium, 

Paramecium 


Description  and  Source  (Continued)  Page(s) 

Annual  American  (relative)  sunspot  numbers  collected  from  1945  to  392 
2005.  The  annual  (relative)  sunspot  number  is  a  weighted  average  of 
solar  activity  measured  from  a  network  of  observatories.  Source: 
www.ngdc.noaa.gov/stp/SOLAR/ ftpsunspotnumber.il  tml# 
american 

Annual  international  sunspot  numbers.  1700-2005,  NOAA  National  379 
Geophysical  Data  Center.  Source: 

ftp.ngdc.noaa.gov/STP/SOLAR_DATA/SUNSPOT_NUMBERS/ 

YEARLY.PLT 

Brightness  of  a  variable  star  at  midnight  on  600  successive  nights.  325 
S ource :  w w w. statsci.org/data/ general/star. html 

A  digitized  sound  file  of  about  0.4  seconds  of  a  Bb  just  below  middle  374 
C  played  on  a  tenor  trombone  by  Chuck  Kreeb,  a  member  of  Tem¬ 
pered  Brass  and  a  friend  of  one  of  the  authors. 

Monthly  average  temperatures  in  Dubuque.  Iowa,  1/1964-12/1975.  6,  213, 
Source:  http://mesonet.agron.iastate.edu/climodat/index.phtml?  379 
station=ia2364&report=  1 6 

A  digitized  sound  file  of  about  0.4  seconds  of  a  Bb  an  octave  and  one  381 
whole  step  below  middle  C  played  on  a  BBb  tuba  by  Linda  Fisher,  a 
member  of  Tempered  Brass  and  a  friend  of  one  of  the  authors. 

Annual  sales  of  certain  large  equipment,  1983-2005.  (Proprietary  276 
sales  data  from  a  large  international  company.  ) 

Daily  exchange  rates  of  U.S.  dollar  to  Hong  Kong  dollar,  01/2005-  31 0 
03/2006.  A  data  frame  with  43 1  observations  on  the  following  six 
variables. 

r:  daily  returns  of  USD/HKD  exchange  rates 

v:  estimated  conditional  variances  based  on  an  AR(1)+GARCH(3,1) 

hkrate:  daily  USD/HKD  exchange  rates 

outlierl :  dummy  variable  of  day  203,  corresponding  to  July  22,  2005 
outlier2:  dummy  variable  of  day  290,  another  possible  outlier 
day:  calendar  day 

Source:  www.oanda.com/convert/fxhistory 

A  bivariate  time  series  from  an  experiment  studying  prey-predator  386 
dynamics.  The  first  time  series  consists  of  the  number  of  prey 
individuals  ( Didinium  natsutum)  per  ml  measured  every  12  hours 
over  a  period  of  35  days.  The  second  time  series  consists  of  the 
corresponding  number  of  predators  ( Paramecium  aurelia)  per  ml. 

Source:  Veilleux,  B.  G.  (1976)  "The  analysis  of  a  predatory  interac¬ 
tion  between  Didinium  and  Paramecium MSc  thesis.  University  of 
Alberta,  Canada.  See  also  www.journals 
.royalsoc.ac.uk/content/lekv0yqp2ecpabvd/archive  1  .pdf 
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Filename/ 

Variable(s) 

wages 

Winnebago 


Description  and  Source  (Continued) 

Monthly  average  hourly  wages  in  the  U.S.  apparel  industry: 
07/1981-06/1987.  Source:  Cryer,  J.  D.  Time  Series  Analysis, 
Duxbury  Press.  Boston.  1986. 

Monthly  unit  sales  of  recreational  vehicles  from  Winnebago,  Inc. 
from  1 1/1966  to  02/1972.  Source:  Roberts.  H.  V.,  Data  Analysis  for 
Managers  with  Minitab ,  second  edition,  The  Scientific  Press,  Red¬ 
wood  City,  C A,  1991. 
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